**Demographic Research Monographs**

Hal Caswell

# **Sensitivity Analysis: Matrix Methods in Demography and Ecology**

# **Demographic Research Monographs**

A Series of the Max Planck Institute for Demographic Research

**Editor-in-chief**

Mikko Myrskylä Max Planck Institute for Demographic Research Rostock, Germany

More information about this series at http://www.springer.com/series/5521

Hal Caswell

# Sensitivity Analysis: Matrix Methods in Demography and Ecology

Hal Caswell Biodiversity & Ecosystem Dynamics University of Amsterdam Amsterdam, The Netherlands

ISSN 1613-5520 ISSN 2197-9286 (electronic) Demographic Research Monographs ISBN 978-3-030-10533-4 ISBN 978-3-030-10534-1 (eBook) https://doi.org/10.1007/978-3-030-10534-1

Library of Congress Control Number: 2018966869

© The Editor(s) (if applicable) and The Author(s) 2019. This book is an open access publication. **Open Access** This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG. The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

*For Moira*

## **Preface**

Sensitivity analysis addresses one of the most persistent of all questions: what would happen *if* ? Within the field of demography, sensitivity analysis might be said to have originated with the groundbreaking, yet very different, papers of Hamilton (1966) and Keyfitz (1971). Hamilton calculated the sensitivity of the intrinsic rate of increase, *r*, to changes in age-specific mortality. He interpreted *r* as a measure of individual fitness, capturing the effects of the phenotype on mortality and fertility. The resulting sensitivities are measures of the strength of natural selection on aging and senescence. Keyfitz calculated sensitivities of population growth rate, life expectancy, and other quantities. Taking a demographic perspective, he interpreted the results as showing the linkage between age-specific rates at the individual level and the "intrinsic" rates expressed at the population level. Both these perspectives on sensitivity analysis continue to play major roles in demography and population biology. Connecting traits to individual rates, and those rates to measures of fitness, is the foundation of evolutionary demography. Understanding linkages between individual rates and population outcomes informs population projections, policy and spending, conservation, health demography, ecotoxicology, and so on.

Fast forward to today. The diversity of demographic models, of the outcomes that can be calculated, and the power of the mathematical tools available to analyze them far exceed those of 50 years ago. Much of this progress is due to the formulation of demographic models in terms of matrices. P. H. Leslie formulated matrix models in the 1940s (Leslie 1945), but they were mostly ignored for two decades until revitalized by a series of studies in the 1960s (Keyfitz 1964; Lefkovitch 1965; Rogers 1968). In the very first issue of the first volume of the new journal *Demography*, Nathan Keyfitz described population projection as a matrix operator (Keyfitz 1964). This book relies on matrix formulations generalized beyond projections to age-structured and stage-structured populations, to linear and nonlinear dynamics, to time-invariant and time-varying vital rates, and to multistate models that combine age and stage information.

The matrix formulation provides easily computable outcomes at the level of the individual (e.g., risks of mortality, longevity, lifetime reproduction), the cohort (e.g., distributions of age or stage at death), and the population (e.g., population growth rate). The mathematical connection of matrix models and the theory of finitestate Markov chains make it possible to go beyond expected outcomes to calculate variances and higher moments and to take full advantage of the stochasticity of demographic events at the individual level (individual stochasticity).

The sensitivity analysis of these diverse outcomes is made possible by the even more recently developed mathematical tool of matrix calculus (Magnus and Neudecker 1988). Matrix calculus permits easy differentiation of scalar-, vector-, and matrix-valued functions of scalar-, vector-, and matrix-valued arguments. This entire book is an application of these methods to demographic problems.

**Organization** The book is (imperfectly) divided into five parts. Part I contains an introduction and a summary of the matrix calculus methods that are used throughout the book.

Part II analyzes linear models for population growth, longevity, and reproduction. In linear models, the per-capita vital rates are independent of population size and structure. When the rates are also time-invariant, these models lead to a stable age or stage structure and exponential growth. The rate of growth is one of the most fundamental outcomes of stable population theory. Chapter 3 analyzes the sensitivity of population growth rate from three directions: differentiation of the characteristic equation, eigenvalue perturbation theory, and matrix calculus, providing the first application of the methods that form the basis of the subsequent chapters. Chapter 4 focuses on longevity, presenting the sensitivity analysis of life expectancy, variance in longevity, and life disparity. Chapter 5 introduces the important concept of individual stochasticity (stochastic outcomes of probabilistic transitions in the life cycle) and explores its effects on longevity, net reproductive rate, birth intervals, and age at reproduction. Some aspects of time variation are introduced, including the first appearance in the book of the powerful vecpermutation matrix method to describe temporally varying environments.

A critical first step in the construction of any demographic model is the choice of the individual state (i-state) variables that capture the relevant information about individuals. Age, developmental stage, body size, and a variety of other properties have been used as i-states. However, it is often the case that a combination of age and some other characteristic is necessary to describe individuals. Chapter 6 presents the sensitivity analysis of such models, using the vec-permutation method to construct multistate models and matrix calculus to differentiate the results.

Part III relaxes the assumption of time invariance. Chapter 7 presents the sensitivity analysis of transient dynamics, i.e., dynamics that happen in the short term, before asymptotic behavior appears. Short-term population growth and structure may differ in important ways from the growth and structure implied by stable population theory. Chapter 7 explores these differences, for cases where the vital rates may be fixed, varying, or even nonlinear. Chapter 8 analyzes periodic models. Such models appear in a variety of guises: as matrix products describing periodic (e.g., seasonal) environmental variation and as matrix products describing distinct processes embedded within an apparently single projection matrix and in the construction of multistate matrix models. In each case, the goal is to describe the sensitivity of some overall outcome, calculated from the entire periodic matrix product, to changes in parameters affecting each component of the matrix. Chapter 9 analyzes population growth in stochastic environments and the problem of decomposing differences in stochastic growth rates into components due to the environment and to the vital rates. This requires a combination of the first-order approximate decomposition known as life table response experiment (LTRE) analysis with the more specialized Kitagawa-Keyfitz decomposition and has potential implications far beyond the stochastic environment case.

Part IV analyzes nonlinear models, including density-dependent models, frequency-dependent models (e.g., models for the interaction of the sexes), nonlinear models for subsidized populations, and a nonlinear approach to the sensitivity of the stable structure and the reproductive value of linear models.

Finally, Part V returns to the analysis of the Markov chain models that form the basis of many of the demographic calculations throughout the book. These chapters take a more mathematical approach to the sensitivity analysis of Markov chains, including some aspects that have yet to find wide demographic application (but the potential is there). Chapter 11 analyzes discrete-time chains, both the absorbing chains familiar in demography (death is an absorbing state in most models) and ergodic chains that include no absorbing states. Chapter 12 presents the sensitivity analysis of continuous-time absorbing Markov chains, using as an example of a model for the stages of colorectal cancer.

Most of the chapters here are based on, or extended from, papers that have appeared in a variety of journals in ecology, population biology, human demography, and applied mathematics. There is overlap among the chapters. This is a feature, not a bug, because it means that similar calculations are revisited with different perspectives, different derivations, and different examples. When choices arose, I tried to choose the presentation that would make things easier for the reader.

The material here certainly does not exhaust the applications of matrix calculus in the sensitivity analysis of demographic models. I have tried to point out directions for further development.

#### **Bibliography**


Amsterdam, The Netherlands Hal Caswell

## **Acknowledgements**

Science is not done alone, and I owe many thanks to institutions, funding sources, and people.

**Institutions** Many of these ideas were developed at the Max Planck Institute for Demographic Research (MPIDR). The connections between the demography of humans, plants, and animals,<sup>1</sup> are not always recognized or appreciated, by either biologists or human demographers. Under the direction of James Vaupel, the MPIDR has shown just how powerful these connections can be, and I have benefited enormously from the hospitality there. There is no place like it.

The Woods Hole Oceanographic Institution (WHOI) provided me with the flexibility to follow scientific ideas wherever they go, on land or sea. I am extremely grateful for this freedom. The University of Amsterdam has been my academic home for the last 5 years, and I must particularly thank André de Roos and the Theoretical Ecology Group there for creating such a great environment in which to do population research. The institutional support of MPIDR, WHOI, and the University of Amsterdam has made this book possible.

**Funding** Over the years in which much of this work was carried out, I was supported by a series of grants from the US National Science Foundation, including Grant DEB-1119774 from the OPUS program, which supported the start of the book. I am grateful for the willingness of NSF to support theoretical ecological research. The Woods Hole Oceanographic Institution provided financial support through an Ocean Life Fellowship and the Robert W. Morse Chair for Excellence in Oceanography. I am especially grateful for a research award from the Alexander von Humboldt Foundation, which funded a lengthy stay at the MPIDR. Last but definitely not least, I am grateful for support from the European Research Council, under the European Union's Seventh Framework Programme (FP7/2007–2013), through ERC Advanced Grant 322989 *Individual Stochasticity and Population*

<sup>1</sup>Yes, I know, humans are animals. But it is just unbearably clumsy to write "human and non-human animals" every time.

xii Acknowledgements

*Heterogeneity in Plant and Animal Demography*. This grant and the team that it permitted me to assemble were essential to this research.

**People** There is a long list of people who deserve thanks (but no blame) for this book. Special thanks to my research group at the University of Amsterdam: Silke van Daalen, Charlotte de Vries, Gregory Roth, Nienke Hartemink, Nora Sanchez Gassen, and Christina Bohk-Ewald. Mike Neubert and Stephanie Jenouvrier at WHOI have been particularly valuable collaborators. My student, Esther Shyu, helped push sensitivity analysis into new directions. Joel Cohen inspired more of this analysis than may be apparent. I thank Nathan Keyfitz for his example. Shripad Tuljapurkar, Carol Horvitz, and Ulrich Steiner have also explored this territory, and discussions with them have been especially valuable. I have presented courses and workshops on sensitivity analysis at the MPIDR and at meetings of the Ecological Society of America, and participants in those workshops have provided valuable feedback.

I have had the good fortune to collaborate with many researchers on sensitivity analysis, including Azmy Ackleh, Annette Baudisch, Christina Bohk-Ewald, Solange Brault, Silke van Daalen, Michal Engelman, Masami Fujiwara, Nienke Hartemink, Carol Horvitz, Christine Hunter, Stephanie Jenouvrier, Petra Klepac, Tiffany Knight, Eleanor Pardini, Alyson van Raalte, Bonnie Ripley, Gregory Roth, Roberto Salguero-Gomez, Nora Sanchez Gassen, Esther Shyu, Carly Strasser, Yngvild Vindenes, Charlotte de Vries, Martin Wensink, Virginia Zarulli, and Ariane Verdy.

The most tired cliché in book-writing is the one where the author thanks a partner whose support has been essential to completion of the work. Clichés, however, are sometimes true, and, in this case, I owe a huge thanks to my wife, Moira Powers, for her unfailing support.

## **Contents**

#### **Part I Introductory and Methodological**



#### **Part II Linear Models**



#### **Part III Time-Varying and Stochastic Models**


#### **Part IV Nonlinear Models**


#### Contents xvii


#### **Part V Markov Chains**



# **Part I Introductory and Methodological**

# **Chapter 1 Introduction: Sensitivity Analysis – What and Why?**

#### **1.1 Introduction**

Demography is a science that connects individual processes and events to the development of cohorts and then to the dynamics of populations. It does so with mathematical models that distinguish among individuals based on their characteristics.<sup>1</sup> The most familiar such model is the life table, which records mortality and fertility of the individual as a function of age, and is used to calculate properties of cohorts (e.g., the distribution of age at death) and populations (e.g., the intrinsic rate of increase).

The life table is the most familiar, but demography has proceeded far beyond that in both models and analyses. In any case, though, a model is defined first by its structure (the states of individuals and the transitions possible among them), then by the rates at which individuals develop, survive, and reproduce throughout the life cycle, then by the functional dependence of those rates (time-invariant or timevarying, density-independent or density-dependent, deterministic or stochastic), and finally by the values of the parameters that define the rates. A set of parameters operating within a given model generates the demographic outcomes calculated from the model (population growth rate, population structure, equilibria, cycles, measures of longevity, state occupancy times, transient behavior and projections, and so on). The *sensitivity problem* is to understand how the outcome[s] change in response to changes in the parameters.

<sup>1</sup>Technically, these characteristics are known as individual state variables, or i-states (Metz and Diekmann 1986; Caswell 2001). Their task is to capture all the information about the individual's history that is relevant to determining its future fate, and a major task of demography is to discover those aspects of the individual necessary for a successful i-state (e.g., de Vries and Caswell 2017). In the models considered here, the population state (p-state) is a distribution function over the set of i-states. Thus, for example, age as an i-state leads to a population described by its age distribution.

H. Caswell, *Sensitivity Analysis: Matrix Methods in Demography and Ecology*, Demographic Research Monographs, https://doi.org/10.1007/978-3-030-10534-1\_1

Why should we care about the effects of change?


It is not an overstatement to say that no model is every fully understood if it does not include a sensitivity analysis.

#### **1.2 Sensitivity, Calculus, and Matrix Calculus**

The change in an outcome in response to a change in a parameter can be treated as a problem in differential calculus. Let *ξ* denote some dependent variable and *θ* some parameter. The sensitivity problem can be approached via the derivative

$$\frac{d\xi}{d\theta} \tag{1.1}$$

or the elasticity, or proportional sensitivity2

$$\frac{\epsilon \,\xi}{\epsilon \,\theta} = \frac{\theta}{\xi} \frac{d\xi}{d\theta} = \frac{d \log \xi}{d \log \theta} \tag{1.2}$$

#### **Note that I will use "sensitivity analysis" to refer generically to both sensitivity and elasticity.**

The sensitivity problem is a challenging task, rather than an exercise in undergraduate calculus, because the dependence of *ξ* on *θ* may be complicated, and

<sup>2</sup>There seems to be no standard notation for elasticities; the one I am using here is based on a suggestion by Samuelson (1947).

because *ξ* may be a scalar (e.g. life expectancy at birth, or population growth rate) or a vector (e.g., a stable stage distribution or a projected population structure) or a matrix (e.g., the matrix of mean occupancy times). Similarly, *θ* may be a scalar (e.g., the Gompertz rate of aging) or a vector (e.g., the age schedule of mortality rates), or a matrix (e.g., the transition matrix among life cycle stages). In addition, the chains of causation in even simple demographic models are complicated. Tracing the causal chains from a set of parameters (of which there may be many) to a set of outcomes (again, many) with complicated interactions is hard.

This book is an in depth exploration of sensitivity analyses based on matrix formulations of demographic calculations. Matrix formulations are designed precisely to map transformations from one multidimensional space to another. Thus they simplify computations, clarify notation, and increase analytical power.3

The premise of this book is that demography as a discipline is neither defined by, nor limited to, a taxon. You will find here examples and analyses of humans, of non-human animals, and plants. Human demography and population biology have mutually informed each other from the beginning, and I see no reason for them to stop now.

It is important to remember that the diversity of complex life histories among the species that occupy our world poses a challenge to demographic analysis that is identical to the challenge posed by the complicated lives of humans. The dynamics of health status, family structure, or socio-economic status introduce complications to the life course exactly comparable to the dynamics of size growth in plants, metamorphosis in insects, or breeding status in birds.

**A bit of history** The earliest focus of demographic sensitivity analysis was population growth rate *λ* (or the intrinsic rate of increase *r* = log *λ*) in linear demographic models. Hamilton (1966) was the first to solve this, in the context of the evolution of senescence. Demetrius (1969) derived a corresponding matrix expression, apparently unaware of Hamilton's results. Goodman (1971) was the first to notice the connection to reproductive value (see Chap. 3). Keyfitz (1971) derived the sensitivity of *r*, but also of life expectancy, mean age at death,and other outcomes.

All these analyses were based on age-classified demographic models. These results were generalized to stage-classified models by applying eigenvalue perturbation theory (Caswell 1978), followed by elasticity calculations (de Kroon et al. 1986), sensitivities of eigenvectors (Caswell 1982), lower-level parameters (Caswell 1989*b*), second derivatives of eigenvalues (Caswell 1996), the population spreading rate (Neubert and Caswell 2000), transient dynamics (Caswell 2007) and other things. Following the important early work of Tuljapurkar (1990), the sensitivity

<sup>3</sup>That does not mean that calculations made by other means are wrong. I am a methodological pluralist, and I do not believe that it is necessary to attack other methods in order to justify the use of matrix methods.

analysis of stochastic models developed in parallel with that of deterministic models (e.g. Tuljapurkar et al. 2003; Haridas and Tuljapurkar 2005; Horvitz et al. 2005; Steinsaltz et al. 2011).

Matrix calculus, permitting differentiation of scalar-, vector-, or matrix-valued functions of scalar, vector, or matrix arguments, began to be developed in the 1960s (see Nel (1980) for some history and comparison of different methods). The approach we will use here was introduced by Neudecker (1969) and expanded by Magnus and Neudecker (1985). A comprehensive, but mathematically difficult, treatment is given in Magnus and Neudecker (1988). Chapter 2 gives a brief presentation of the matrix calculus methods we will utilize in this book.

#### **1.3 Some Issues**

Sensitivity analysis is more than an algebraic exercise; it is a tool for making inferences and drawing conclusions about substantive demographic issues. It is useful to bring to the discussion a perspective on some questions.

#### *1.3.1 Prospective and Retrospective Analyses: Sensitivity and Decomposition*

If some variable *ξ* is a function of a set of parameters *θ*1*,...,θp*, then *<sup>∂</sup><sup>ξ</sup> ∂θi* gives the rate of change of *ξ* in response to a change in the *i*th parameter, holding the rest constant. Contrary to what is sometimes assumed, this calculation requires no assumption that it is actually possible to change the parameters. If the flight velocity of pigs is one of the parameters in the model, the analysis will happily answer the question of what would happen if pigs could fly.

Nor is there any assumption that changes in *θi* have ever happened in the past. The sensitivity analysis looks forward, asking what would happen *if* this or that parameter were to change. It is thus referred to as *prospective* analysis (Caswell 2000).

On the other hand, suppose you find yourself considering two values of *ξ* , that have resulted from two different situations (times, places, conditions), each with its own set of parameters:

$$
\theta\_1^{(1)}, \theta\_2^{(1)}, \dots \longrightarrow \xi^{(1)}
$$

$$
\theta\_1^{(2)}, \theta\_2^{(2)}, \dots \longrightarrow \xi^{(2)}
$$

You ask, what caused the difference between *ξ (*2*)* and *ξ (*1*)* . Knowing the derivatives *∂ξ ∂θi* cannot tell you, because you are not asking the counterfactual question of what would happen *if*, but the very factual question of what actually happened between the two situations. This is a *retrospective* analysis, familiar to human demographers as a decomposition problem (e.g., Kitagawa 1955; Canudas Romo 2003).

One widely used approach to understanding the causes of observed differences is life table response experiment (LTRE) analysis,4 which uses a first-order approximation to decompose the differences,

$$
\Delta\xi = \xi^{(2)} - \xi^{(1)} \approx \sum\_{i} \frac{\partial\xi}{\partial\theta\_{i}} \left(\theta\_{i}^{(2)} - \theta\_{i}^{(1)}\right). \tag{1.3}
$$

The *i*th term in the summation is the contribution of the difference in the parameter *θi* to the difference in the outcome, *ξ* . These contributions reflect *both* the sensitivity of *ξ* to the parameters and the differences between conditions in each of the parameters. Parameters to which *ξ* is not very sensitive can make large contributions if the difference *θi* is big enough. Contributions to which *ξ* is very sensitive can make small contributions if *θi* does not change much. The matrix calculus version of this decomposition is given in Sect. 2.9, applied to differences in life disparity in Chap. 4, to periodic environments in Chap. 8, and explored in the challenging context of stochastic models in Chap. 9.

The distinction between prospective and retrospective analysis is obvious once the questions they address are specified, but it has challenged a number of authors (e.g., Wisdom and Mills 1997; Manlik et al. 2017). A particularly insightful discussion of these ideas, in somewhat different terminology, appears in Nathan Keyfitz's essay, *How do we know the facts of demography?*, which now appears as Chapter 20 of Keyfitz and Caswell (2005).

#### *1.3.2 Uncertainty Propagation*

Suppose that *ξ* is a function of *θ*, but *θ* is known only imperfectly. Then *ξ* is also known only imperfectly; the uncertainty in *θ* is propagated from *θ* to *ξ* . The sensitivity *dξ/dθ* alone says nothing about uncertainty, and the uncertainty in *ξ* says nothing about the sensitivity.

Uncertainty propagation can be calculated by simulation if a probability distribution is known that can describe the uncertainty in *θ*. Sampling from this distribution and calculating *ξ* for each sampled parameter gives the distribution of *ξ* resulting from the uncertainty in *θ* (e.g. Caswell et al. 1998; Salomon et al.

<sup>4</sup>This awkward but well entrenched nomenclature was created when I was trying to understand the interpretation of experiments in ecotoxicology in which laboratory cohorts would be exposed to some noxious substance, and a life table (mortality and fertility schedule) measured as a response variable (Caswell 1989*a*). It soon became apparent that the method could be applied to any comparison of different conditions, and that the response could be any demographic variable. See (Caswell 2001, Chapter 10) for details.

2001). If the distribution of *θ* comes from an empirical set of measurements, this approach converges to the bootstrap (Efron and Tibshirani 1993). If *θ* has a parametric distribution (e.g., the multivariate normal distribution returned by maximum likelihood estimation) the technique is sometimes known as a parametric bootstrap (e.g., Regehr et al. 2010).

Sensitivity analysis can contribute to uncertainty propagation analysis through the first order, small variance approximation to the variance in *ξ* ,

$$V(\xi) \approx \sum\_{i,j} \left(\frac{\partial \xi}{\partial \theta\_i}\right) \left(\frac{\partial \xi}{\partial \theta\_j}\right) \text{Cov}(\theta\_i, \theta\_j) \tag{1.4}$$

Notice again that sensitivity does not, by itself, say anything about uncertainty, but it does show how the (co)variance in parameters will propagate to the variance in the outcome *ξ* .

#### *1.3.3 Why Not Just Simulate?*

If you work on these problems, or if you apply these methods in particular studies, eventually you will be asked (often by a reviewer), why not just do it all by simulation?Just evaluate *ξ* at the value *θ*, and at *θ* + *θ*, and then approximate the derivative as

$$\frac{\Delta\xi}{\Delta\theta} = \frac{\xi(\theta + \Delta\theta) - \xi(\theta)}{\Delta\theta} \tag{1.5}$$

for some very small value of *θ*.

Three answers come to mind. First, if *θ* and the model are of sufficiently high dimension, there can be a lot of these perturbations to be calculated. For example, population projections of the type analyzed by Caswell and Sanchez Gassen (2015), with 102 ages, 2 sexes, 3 vital rates, and projections on the order of 50 years, have over 30,000 parameters. A numerical perturbation of each of these would be painful.

Second, the computation of derivatives by numerical perturbations is a notoriously ill-behaved problem. A standard reference on computations in applied mathematics says that this approximation is "almost guaranteed to produce inaccurate results" (Press et al. 1992, p. 185). It is subject to truncation error (caused by making the perturbation too large) and roundoff error (caused by making the perturbation too small). In some applications these errors will be unimportant, but in others they can be crucial (e.g., Hunter and Caswell 2009, for an example in mark-recapture analysis)s.

Third, and more basic and telling: an exact answer is always an improvement over an approximation. When an exact answer is available, in an easily computable form, there must be strong arguments to support the idea that a less efficient and less accurate approximation is just as good. And having both exact and approximate methods is even better.

These arguments apply to numerical calculation of derivatives. But simulation has an important place in analyzing *scenarios*; i.e., the results of specified collections of parameters, usually with multiple and large differences among them. When population projections are reported with "high," "medium," and "low" fertility scenarios, the point is to compare a range of multivariate alternatives. Other examples include comparisons of screening procedures for colorectal cancer (Wu et al. 2006), or projections based on IPCC global climate models (e.g., Jenouvrier et al. 2012). In principle, sensitivity analysis could support these calculations by suggesting interesting scenarios, highlighting the parameters with the biggest impact on the outcome.

#### *1.3.4 Sensitivity and Identifying Targets for Intervention*

To intervene is to change something. Population biologists concerned with endangered species would like to intervene to increase the population growth rate. Those concerned with invasive pests would like to do the same, but in the opposite direction. Human demographers focused on aging societies wonder about how policies would change age distributions or dependency ratios. In all these cases, the interventions operate through changes in demographic parameters, and thus sensitivity analysis can reveal something about their effects.

This logic has led to the use of prospective perturbation analyses in conservation biology, using the sensitivity or elasticity of population growth rate to identify promising targets for intervention. The first such use involved the loggerhead sea turtle (Crouse et al. 1987). Standard practice at the time was to focus on protecting eggs and hatchlings on nesting beaches. But a sensitivity analysis showed that population growth rate was not very sensitive to these stages, and much more sensitive to changes in survival of adults at sea. This led to a recommendation, and then a policy, to install "turtle excluder devices" on the nets used in coastal shrimp fisheries in the United States, to reduce mortality due to adult turtles being captured in those nets.

This basic idea has become a part of the toolkit for conservation biology, but has also fallen victim to a kind of magical thinking that first makes unrealistic expectations of the sensitivity analysis and then blames the analysis for failing to meet those expectations. For a recent example see Manlik et al. (2017); for a thorough description of the issues and some of their solutions, see Caswell (2001, Chapter 18).

The fact remains that knowing the sensitivity of some outcome *ξ* to some parameter *θ* gives the rate of change of *ξ* in response to an intervention that changes *θ*. That is valuable information to have in considering the various interventions that might bring about a desired change.

#### *1.3.5 The Dream of Easy Interpretation*

This book is full of long and complicated formulas. Occasionally, these formulas yield easy, readily apparent, qualitative interpretations.5 But not often. There is a reason for this. The formulas are complicated because the processes are complicated, and because the results are given at a high level of generality. Chapter 10, for example, analyzes the sensitivity of nonlinear, density-dependent models. It derives a complicated formula for the sensitivity of *any* function of the equilibrium population, to changes in *any* parameter affecting *any* of the vital rates, in *any* ageor stage-specific way, for *any* choice of stage classification and *any* survival, fertility, and transition rates, with *any* pattern of density dependence, for *any* species with *any* kind of life history. Accounting for that web of dependencies, in such generality, makes finding an easily interpretable formula an unlikely dream.

Not an impossible dream, but in general, insights of that kind arise from simplifying general methods to address particular situations. Specifying a particular demographic structure, choosing an outcome variable of interest, and carefully specifying the functional dependencies, if done skillfully, can lead to qualitative results.

#### **1.4 The Importance of Change**

Questions of *change* lurk in almost every demographic (every scientific?) study. We ask how things have changed in the past, how they differ among populations in the present, and how they will, or may, change in the future. Even apparently simple descriptive statements (the results of a census in a particular time and place, for example) are almost immediately examined in comparison with other times and/or places.

Sensitivity analysis is a powerful tool for analyzing change, in the special case of demographic outcomes that are calculated as functions of some set of parameters. As the chapters to come will make clear, this covers a wide landscape of interesting demographic questions. And the list is not yet complete.

#### **Bibliography**


<sup>5</sup>For example the interpretation of the sensitivity of population growth rate to matrix elements in terms of stable structure and reproductive value in Chap. 3, or the sensitivity of life expectancy to mortality in terms of occupancy time and transition probabilities in Chap. 4.


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Chapter 2 Matrix Calculus and Notation**

#### **2.1 Introduction: Can It Possibly Be That Simple?**

In October of 2005, I scribbled in a notebook, "can it possibly be *that* simple?" I was referring to the sensitivity of transient dynamics (the eventual results appear in Chap. 7), and had just begun to use matrix calculus as a tool. The answer to my question was yes. It can be that simple.

This book relies on this set of mathematical techniques. This chapter introduces the basics, which will be used throughout the text. For more information, I recommend four sources in particular. The most complete treatment, but not the easiest starting point, is the book by Magnus and Neudecker (1988). More accessible introductions can be found in the paper by Magnus and Neudecker (1985) and especially the text by Abadir and Magnus (2005). A review paper by Nel (1980) is helpful in placing the Magnus-Neudecker formulation in the context of other attempts at a calculus of matrices.

Sensitivity analysis asks how much change in an outcome variable *y* is caused by a change in some parameter *x*. At its most basic level, and with some reasonable assumptions about the continuity and differentiability of the functional relationships involved, the solution is given by differential calculus. If *y* is a function of *x*, then the derivative

$$\frac{dy}{dx}$$

tells how *y* responds to a change in *x*, i.e., the sensitivity of *y* to a change in *x*.

However, the outcomes of a demographic calculation may be scalar-valued (e.g., the population growth rate *λ*), vector-valued (e.g., the stable stage distribution), or matrix-valued (e.g., the fundamental matrix). Any of these outcomes may be functions of scalar-valued parameters (e.g., the Gompertz aging rate), vector-valued parameters (e.g., the mortality schedule), or matrix-valued parameters (e.g., the transition matrix) parameters. Thus, sensitivity analysis in demography requires more than the simple derivative in (2.1). We want a consistent and flexible approach to differentiating

> ⎧ ⎨ ⎩ scalar-valued vector-valued matrix-valued ⎫ ⎬ ⎭ functions of ⎧ ⎨ ⎩ scalar vector matrix ⎫ ⎬ ⎭ arguments

#### **2.2 Notation and Matrix Operations**

#### *2.2.1 Notation*

Matrices are denoted by upper case bold symbols (e.g., **A**), vectors (usually) by lower case bold symbols (**n**). The *(i, j )* entry of the matrix **A** is *aij* , and the *i*th entry of the vector **n** is *ni*. Sometimes we will use MATLAB notation, and write

$$\mathbf{X}(i,:) = \text{row } i \text{ of } \mathbf{X} \tag{2.1}$$

$$\mathbf{X}(:,j) = \text{column } j \text{ of } \mathbf{X} \tag{2.2}$$

The notation

 *x(ij )*

denotes a matrix whose *(i, j )* entry is *x*. For example,

$$\left(\frac{d\mathbf{y}\_i}{d\mathbf{x}\_j}\right)$$

is the matrix whose *(i, j )* entry is the derivative of *yi* with respect to *xj* .

The transpose of **X** is **X**T. Logarithms are natural logarithms. The vector norm **x** is, unless noted otherwise, the 1-norm. The symbol D *(***x***)* denotes the square matrix with **x** on the diagonal and zeros elsewhere. The symbol **1** denotes a vector of ones. The vector **e***<sup>i</sup>* is a unit vector with 1 in the *i*th entry and zeros elsewhere. The identity matrix is **I**. Where necessary for clarity, the dimension of matrices or vectors will be indicated by a subscript. Thus **I***<sup>s</sup>* is a *s* × *s* identity matrix, **1***<sup>s</sup>* is an *s* × 1 vector of ones, and **X***m*×*<sup>n</sup>* is a *m* × *n* matrix.

In some places (Chaps. 6 and 10) block-structured matrices appear; these are denoted by either A or **A**˜ , depending on the context and the role of the matrix.

#### *2.2.2 Operations*

In addition to the familiar matrix product **AB**, we will also use the Hadamard, or elementwise product

$$\mathbf{A} \circ \mathbf{B} = \left( a\_{lj} b\_{lj} \right) \tag{2.3}$$

and the Kronecker product

$$\mathbf{A} \otimes \mathbf{B} = \left( a\_{lj} \mathbf{B} \right) \tag{2.4}$$

The Hadamard product requires that **A** and **B** be the same size. The Kronecker product is defined for any sizes of **A** and **B**. Some useful properties of the Kronecker product include

$$(\mathbf{A}\otimes\mathbf{B})^{-1}=\left(\mathbf{A}^{-1}\otimes\mathbf{B}^{-1}\right)\tag{2.5}$$

$$(\mathbf{A}\otimes\mathbf{B})^{\mathsf{T}}=\left(\mathbf{A}^{\mathsf{T}}\otimes\mathbf{B}^{\mathsf{T}}\right)\tag{2.6}$$

$$\mathbf{A}\otimes(\mathbf{B}+\mathbf{C})=(\mathbf{A}\otimes\mathbf{B})+(\mathbf{A}\otimes\mathbf{C})\tag{2.7}$$

and, provided that the matrices are of the right size for the products to be defined,

$$(\mathbf{A}\_1 \otimes \mathbf{B}\_1) \left(\mathbf{A}\_2 \otimes \mathbf{B}\_2\right) = \left(\mathbf{A}\_1 \mathbf{A}\_2 \otimes \mathbf{B}\_1 \mathbf{B}\_2\right). \tag{2.8}$$

#### *2.2.3 The Vec Operator and Vec-Permutation Matrix*

The vec operator transforms a *m* × *n* matrix **A** into a *mn* × 1 vector by stacking the columns one above the next,

$$\operatorname{vec} \mathbf{A} = \begin{pmatrix} \mathbf{A}(:, \, 1) \\ \vdots \\ \mathbf{A}(:, \, n) \end{pmatrix} \tag{2.9}$$

For example,

$$\text{vec}\begin{pmatrix} a \ b \\ c \ d \end{pmatrix} = \begin{pmatrix} a \\ c \\ b \\ d \end{pmatrix} . \tag{2.10}$$

The vec of **A** and the vec of **A**<sup>T</sup> are rearrangements of the same entries; they are related by

$$\mathbf{vec}\,\mathbf{A}^{\mathsf{T}} = \mathbf{K}\_{m,n}\mathbf{vec}\,\mathbf{A} \tag{2.11}$$

where **A** is *m* × *n* and **K***m,n* is the *vec-permutation matrix* (Henderson and Searle 1981) or *commutation matrix* (Magnus and Neudecker 1979). The vec-permutation matrix can be calculated as

$$\mathbf{K}\_{m,n} = \sum\_{i=1}^{m} \sum\_{j=1}^{n} \left( \mathbf{E}\_{ij} \otimes \mathbf{E}\_{ij}^{\sf T} \right) \tag{2.12}$$

where **E***ij* is a matrix, of dimension *m* × *n*, with a 1 in the *(i, j )* entry and zeros elsewhere. Like any permutation matrix, **<sup>K</sup>**−<sup>1</sup> <sup>=</sup> **<sup>K</sup>**T.

The vec operator and the vec-permutation matrix are particularly important in multistate models (e.g., age×stage-classified models), where they are used in both the formulation and analysis of the models (e.g., Caswell 2012, 2014; Caswell and Salguero-Gómez 2013; Caswell et al. 2018); see also Chap. 6. Extensions to an arbitrary number of dimensions, so-called hyperstate models, have been presented by Roth and Caswell (2016).

#### *2.2.4 Roth's Theorem*

The vec operator and the Kronecker product are connected by a theorem due to Roth (1934):

$$\text{vec}\ (\mathbf{A}\mathbf{B}\mathbf{C}) = \left(\mathbf{C}^{\mathsf{T}} \otimes \mathbf{A}\right)\text{vec}\ \mathbf{B}.\tag{2.13}$$

We will often want to obtain the vec of a matrix that appears in the middle of a product; we will use Roth's theorem repeatedly.

#### **2.3 Defining Matrix Derivatives**

The derivative of a scalar *y* with respect to a scalar *x* is familiar. What, however, does it mean to speak of the derivative of a scalar with respect to a vector, or of a vector with respect to another vector, or any other combination? These can be defined in more than one way and the choice is critical (Nel 1980; Magnus and Neudecker 1985). This book relies on the notation due to Magnus and Neudecker, because it makes certain operations possible and consistent.

• If *x* and *y* are scalars, the derivative of *y* with respect to *x* is the familiar derivative *dy/dx*.

#### 2.4 The Chain Rule 17

• If **y** is a *n* × 1 vector and *x* a scalar, the derivative of **y** with respect to *x* is the *n* × 1 column vector

$$\frac{d\mathbf{y}}{dx} = \begin{pmatrix} \frac{d\mathbf{y}\_1}{dx} \\ \vdots \\ \frac{d\mathbf{y}\_n}{dx} \end{pmatrix}. \tag{2.14}$$

• If *y* is a scalar and **x** a *m* × 1 vector, the derivative of *y* with respect to **x** is the 1 × *m* row vector (called the gradient vector)

$$\frac{d\mathbf{y}}{d\mathbf{x}^{\mathsf{T}}} = \left(\frac{\partial \mathbf{y}}{\partial x\_1} \cdots \frac{\partial \mathbf{y}}{\partial x\_m}\right). \tag{2.15}$$

Note the orientation of *d***y***/dx* as a column vector and *dy/d***x**<sup>T</sup> as a row vector.

• If **y** is a *n* × 1 vector and **x** a *m* × 1 vector, the derivative of **y** with respect to **x** is defined to be the *n* × *m* matrix whose *(i, j )* entry is the derivative of *yi* with respect to *xj* , i.e.,

$$\frac{d\mathbf{y}}{d\mathbf{x}^{\mathsf{T}}} = \left(\frac{d\mathbf{y}\_{l}}{d\mathbf{x}\_{j}}\right) \tag{2.16}$$

(this matrix is called the Jacobian matrix).

• Derivatives involving matrices are written by first transforming the matrices into vectors using the vec operator, and then applying the rules for vector differentiation to the resulting vectors. Thus, the derivative of the *m* × *n* matrix **Y** with respect to the *p* × *q* matrix **X** is the *mn* × *pq* matrix

$$\frac{d\mathbf{vec}\,\mathbf{Y}}{d\,(\mathbf{vec}\,\mathbf{X})^{\mathsf{T}}}.\tag{2.17}$$

From now on, I will write vec <sup>T</sup>**X** for *(*vec **X***)*T.

#### **2.4 The Chain Rule**

The chain rule for differentiation is your friend. The Magnus-Neudecker notation, unlike some alternatives, extends the familiar scalar chain rule to derivatives of vectors and matrices (Nel 1980; Magnus and Neudecker 1985). If **u** (size *m* × 1) is a function of **v** (size *n* × 1) and **v** is a function of **x** (size *p* × 1), then

$$\underbrace{\frac{d\mathbf{u}}{d\mathbf{x}^{\mathsf{T}}}}\_{m\times p} = \underbrace{\left(\frac{d\mathbf{u}}{d\mathbf{v}^{\mathsf{T}}}\right)}\_{m\times n} \underbrace{\left(\frac{d\mathbf{v}}{d\mathbf{x}^{\mathsf{T}}}\right)}\_{n\times p} \tag{2.18}$$

Notice that the dimensions are correct, and that the order of the multiplication matters. Checking dimensional consistency in this way is a useful way to find errors.

#### **2.5 Derivatives from Differentials**

The key to the matrix calculus of Magnus and Neudecker (1988) is the relationship between the differential and the derivative of a function. Experience suggests that, for many readers of this book, this relationship is shrouded in the mists of long-ago calculus classes.

#### *2.5.1 Differentials of Scalar Function*

Start with scalars. Suppose that *y* = *f (x)* is a differentiable function at *x* = *x*0. Then the derivative of *y* with respect to *x* at the point *x*<sup>0</sup> is defined as

$$f'(\mathbf{x}\_0) = \lim\_{h \to 0} \frac{f(\mathbf{x}\_0 + h) - f(\mathbf{x}\_0)}{h}. \tag{2.19}$$

Now define the differential of *y*. This is customarily denoted *dy*, but for the moment, I will denote it by *cy*. The differential of *y* at *x*<sup>0</sup> is a function of *h*, defined by

$$c\mathbf{y}(\mathbf{x}\_0, h) = f'(\mathbf{x}\_0)h. \tag{2.20}$$

There is no requirement that *h* be "small." Since *x* is a function of itself, *x* = *g(x)*, with *g (x)* = 1, we also have *cx(x*0*, h)* = *g (x*0*)h* = *h*. Thus the ratio of the differential of *y* and the differential of *x* is

$$\frac{\text{cy}(\mathbf{x}\_0, h)}{c\mathbf{x}(\mathbf{x}\_0, h)} = \frac{f'(\mathbf{x}\_0)h}{h} = f'(\mathbf{x}\_0). \tag{2.21}$$

That is, *the derivative is equal to the ratio of the differentials*.

Now, return to the standard notation of *dy* for the differential of *y*. This gives two meanings to the familiar notation for derivatives,

$$\left. \frac{d\mathbf{y}}{dx} \right|\_{\mathbf{x}\_0} = f'(\mathbf{x}\_0). \tag{2.22}$$

The left hand side can be regarded either as equivalent to the limit (2.19) or the ratio of the differentials given by (2.21). Mathematicians are strangely unconcerned with this ambiguity (e.g., Hardy 1952).

All this leads to a set of familiar rules for calculating differentials that guarantee that they can be used to create derivatives. A few of these, for scalars, are

$$d(u+v) = du + dv\tag{2.23}$$

$$d(cu) = c \, du\tag{2.24}$$

$$d(uv) = \mu(dv) + (du)v \tag{2.25}$$

$$d(e^{\mu}) = e^{\mu}du\tag{2.26}$$

$$d(\log u) = \frac{1}{u} du \tag{2.27}$$

If *y* = *f (x*1*, x*2*)*, then the total differential is

$$d\mathbf{y} = \frac{\partial f}{\partial \mathbf{x}\_1} d\mathbf{x}\_1 + \frac{\partial f}{\partial \mathbf{x}\_2} d\mathbf{x}\_2. \tag{2.28}$$

Derivatives can be constructed from these expressions at will by dividing by differentials. For example, dividing (2.23) by *dx* gives *d(u* + *v)/dx* = *du/dx* + *dv/dx*. From (2.28), we have

$$\frac{d\mathbf{y}}{d\mathbf{x}\_{\mathrm{l}}} = \frac{\partial f}{\partial \mathbf{x}\_{\mathrm{l}}} + \frac{\partial f}{\partial \mathbf{x}\_{2}} \frac{d\mathbf{x}\_{2}}{d\mathbf{x}\_{\mathrm{l}}} \tag{2.29}$$

$$\frac{d\mathbf{y}}{d\mathbf{x}\_2} = \frac{\partial f}{\partial \mathbf{x}\_1}\frac{d\mathbf{x}\_1}{d\mathbf{x}\_2} + \frac{\partial f}{\partial \mathbf{x}\_2}.\tag{2.30}$$

#### *2.5.2 Differentials of Vectors and Matrices*

To extend these concepts to matrices, we define the differential of a matrix (or vector) as the matrix (or vector) of differentials of the elements; i.e.,

$$d\mathbf{X} = \left(d\mathbf{x}\_{ij}\right). \tag{2.31}$$

This definition leads to some basic rules for differentials of matrices:

$$d(c\mathbf{U}) = c(d\mathbf{U})\tag{2.32}$$

$$d(\mathbf{U} + \mathbf{V}) = d\mathbf{U} + d\mathbf{V} \tag{2.33}$$

$$d(\mathbf{U}\mathbf{V}) = (d\mathbf{U})\mathbf{V} + \mathbf{U}(d\mathbf{V})\tag{2.34}$$

$$d(\mathbf{U}\otimes\mathbf{V})=(d\mathbf{U})\otimes\mathbf{V}+\mathbf{U}\otimes(d\mathbf{V})\tag{2.35}$$

$$d(\mathbf{U}\diamond \mathbf{V}) = (d\mathbf{U})\diamond \mathbf{V} + \mathbf{U}\diamond (d\mathbf{V})\tag{2.36}$$

$$d\mathbf{vec}\,\mathbf{U} = \mathbf{vec}\,d\mathbf{U} \tag{2.37}$$

where *c* is a constant, and, of course, the dimensions of **U** and **V** must be conformable. The differentials of an operators applied elementwise to a vector can be obtained from the differentials of the elements. For example, suppose **u** is a *s* ×1 vector, and the exponential is applied elementwise. Then

$$d(\exp(\mathbf{u})) = \begin{pmatrix} e^{u\_1} du\_1 \\ \vdots \\ e^{u\_s} du\_s \end{pmatrix} \tag{2.38}$$

$$=\mathcal{D}\left[\exp(\mathbf{u})\right]d\mathbf{u}.\tag{2.39}$$

If **y** is a function of **x**<sup>1</sup> and **x**2, the total differential is given just as in (2.28), by

$$d\mathbf{y} = \frac{\partial \mathbf{y}}{\partial \mathbf{x}\_1^\mathsf{T}} d\mathbf{x}\_1 + \frac{\partial \mathbf{y}}{\partial \mathbf{x}\_2^\mathsf{T}} d\mathbf{x}\_2 \tag{2.40}$$

#### **2.6 The First Identification Theorem**

For scalar *y* and *x*,

$$d\mathbf{y} = q d\mathbf{x} \implies \frac{d\mathbf{y}}{d\mathbf{x}} = q. \tag{2.41}$$

That much is easy. But, suppose that **y** is a *n*×1 vector function of the *m*×1 vector **x**. The differential *d***y** is the *n* × 1 vector

$$d\mathbf{y} = \begin{pmatrix} d\mathbf{y}\_1 \\ \vdots \\ d\mathbf{y}\_n \end{pmatrix} \tag{2.42}$$

which, by the total derivative rule, is

$$d\mathbf{y} = \begin{pmatrix} \frac{\partial \mathbf{y}\_1}{\partial x\_1} d\mathbf{x}\_1 + \dots + \frac{\partial \mathbf{y}\_1}{\partial x\_m} d\mathbf{x}\_m\\ \vdots\\ \frac{\partial \mathbf{y}\_n}{\partial x\_1} d\mathbf{x}\_1 + \dots + \frac{\partial \mathbf{y}\_n}{\partial x\_m} d\mathbf{x}\_m \end{pmatrix} \tag{2.43}$$

$$\mathbf{x} = \begin{pmatrix} \frac{\partial \mathbf{y}\_1}{\partial x\_1} & \cdots & \frac{\partial \mathbf{y}\_1}{\partial x\_m} \\ \vdots & \vdots \\ \frac{\partial \mathbf{y}\_n}{\partial x\_1} & \cdots & \frac{\partial \mathbf{y}\_n}{\partial x\_m} \end{pmatrix} \begin{pmatrix} dx\_1 \\ \vdots \\ dx\_m \end{pmatrix} \tag{2.44}$$

$$\mathbf{u} = \mathbf{Q} \, d\mathbf{x}.\tag{2.45}$$

If these were scalars, dividing both sides by *d***x** would give **Q** as the derivative of **y** with respect to **x**. But, one cannot divide by a vector. Instead, Magnus and Neudecker proved that if it can be shown that

$$d\mathbf{y} = \mathbf{Q} \, d\mathbf{x} \tag{2.46}$$

then the derivative is

$$\frac{d\mathbf{y}}{d\mathbf{x}^{\mathsf{T}}} = \mathbf{Q}.\tag{2.47}$$

This is the First Identification Theorem of Magnus and Neudecker (1988).<sup>1</sup>

#### *2.6.1 The Chain Rule and the First Identification Theorem*

Suppose that *d***y** is given by (2.46), and that **x** is in turn a function of some vector *θ*. Then

$$d\mathbf{x} = \frac{d\mathbf{x}}{d\boldsymbol{\theta}^{\top}} d\boldsymbol{\theta} \tag{2.48}$$

and

$$\frac{d\mathbf{y}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \mathbf{Q} \frac{d\mathbf{x}}{d\boldsymbol{\theta}^{\mathsf{T}}}.\tag{2.49}$$

In other words, the differential expression (2.46) can be transformed into a derivative with respect to any vector by careful use of the chain rule. This applies equally to more complicated expressions for the differential. Suppose that

$$d\mathbf{y} = \mathbf{Q}d\mathbf{x} + \mathbf{R}d\mathbf{z}.\tag{2.50}$$

<sup>1</sup>There is also a second identification theorem that provides the second derivatives of matrix functions. See Shyu and Caswell (2014) for applications of this theory to the second derivatives of measures of population growth rate.

Applying the chain rule to the differentials on the right hand side gives

$$d\mathbf{y} = \mathbf{Q}\frac{d\mathbf{x}}{d\theta^{\mathsf{T}}}d\theta + \mathbf{R}\frac{d\mathbf{z}}{d\theta^{\mathsf{T}}}d\theta\tag{2.51}$$

for any vector *θ*. Thus

$$d\mathbf{y} = \left(\mathbf{Q}\frac{d\mathbf{x}}{d\boldsymbol{\theta}^{\mathsf{T}}} + \mathbf{R}\frac{d\mathbf{z}}{d\boldsymbol{\theta}^{\mathsf{T}}}\right)d\boldsymbol{\theta},\tag{2.52}$$

and the First Identification Theorem gives

$$\frac{d\mathbf{y}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left(\mathbf{Q}\frac{d\mathbf{x}}{d\boldsymbol{\theta}^{\mathsf{T}}} + \mathbf{R}\frac{d\mathbf{z}}{d\boldsymbol{\theta}^{\mathsf{T}}}\right). \tag{2.53}$$

#### **2.7 Elasticity**

When parameters are measured on different scales, it is sometimes helpful to calculate proportional effects of proportional perturbations, also called elasticities. The elasticity of *yi* to *θj* is

$$\frac{\epsilon\_{\mathbf{y}l}}{\epsilon\_{\mathbf{e}\theta\_{j}}} = \frac{\theta\_{j}}{\mathbf{y}\_{l}} \frac{d\mathbf{y}\_{l}}{d\theta\_{j}}.\tag{2.54}$$

For vectors **y** and *θ*, this becomes

$$\frac{\epsilon \mathbf{y}}{\epsilon \boldsymbol{\theta}^{\mathsf{T}}} = \mathcal{D} \left( \mathbf{y} \right)^{-1} \frac{d \mathbf{y}}{d \boldsymbol{\theta}^{\mathsf{T}}} \mathcal{D} \left( \boldsymbol{\theta} \right). \tag{2.55}$$

There seems to be no accepted notation for elasticities; the notation used here is adapted from that in Samuelson (1947).

#### **2.8 Some Useful Matrix Calculus Results**

Several matrix calculus results will be used repeatedly. Many more can be found in Magnus and Neudecker (1988) and Abadir and Magnus (2005).

1. The matrix product **Y** = **AB**. Differentiate,

$$d\mathbf{Y} = (d\mathbf{A})\mathbf{B} + \mathbf{A}(d\mathbf{B}).\tag{2.56}$$

Then write (or imagine writing; with practice one does not actually need this step explicitly)

$$(d\mathbf{A})\,\mathbf{B} = \mathbf{I}\,(d\mathbf{A})\,\mathbf{B} \tag{2.57}$$

$$\mathbf{A}\left(d\mathbf{B}\right) = \mathbf{A}\left(d\mathbf{B}\right)\mathbf{I} \tag{2.58}$$

and apply the vec operator and Roth's theorem, to obtain

$$d\operatorname{vec}\mathbf{Y} = \left(\mathbf{B}^{\mathsf{T}} \otimes \mathbf{I}\right)d\operatorname{vec}\mathbf{A} + \left(\mathbf{I} \otimes \mathbf{A}\right)d\operatorname{vec}\mathbf{B}.\tag{2.59}$$

The chain rule gives, for any vector variable *θ*

$$\frac{d\mathbf{vec}\,\mathbf{Y}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left(\mathbf{B}^{\mathsf{T}} \otimes \mathbf{I}\right) \frac{d\mathbf{vec}\,\mathbf{A}}{d\boldsymbol{\theta}^{\mathsf{T}}} + \left(\mathbf{I} \otimes \mathbf{A}\right) \frac{d\mathbf{vec}\,\mathbf{B}}{d\boldsymbol{\theta}^{\mathsf{T}}}.\tag{2.60}$$

2. The Hadamard product **Y** = **A** ◦ **B**. Differentiate the product,

$$d\mathbf{Y} = d\mathbf{A} \diamond \mathbf{B} + \mathbf{A} \diamond d\mathbf{B},\tag{2.61}$$

then vec

$$d\mathbf{vec}\,\mathbf{Y} = d\mathbf{vec}\,\mathbf{A} \diamond \mathbf{v}\,\mathbf{c}\,\mathbf{B} + \mathbf{vec}\,\mathbf{A} \diamond d\mathbf{vec}\,\mathbf{B}.\tag{2.62}$$

It will be useful to replace the Hadamard products, which we do using the fact that **x** ◦ **y** = D *(***x***)***y**, to get

$$d\text{vec}\,\mathbf{Y} = \mathcal{D}\,(\text{vec}\,\mathbf{B})d\text{vec}\,\mathbf{A} + \mathcal{D}\,(\text{vec}\,\mathbf{A})d\text{vec}\,\mathbf{B}.\tag{2.63}$$

The chain rule gives the derivative from the differential,

$$\frac{d\mathbf{vec}\,\mathbf{Y}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \mathcal{D}\,(\mathbf{vec}\,\mathbf{B})\frac{d\mathbf{vec}\,\mathbf{A}}{d\boldsymbol{\theta}^{\mathsf{T}}} + \mathcal{D}\,(\mathbf{vec}\,\mathbf{A})\frac{d\mathbf{vec}\,\mathbf{B}}{d\boldsymbol{\theta}^{\mathsf{T}}}.\tag{2.64}$$

3. Diagonal matrices. The diagonal matrix D *(***x***)*, with the vector **x** on the diagonal and zeros elsewhere, can be written

$$\mathcal{D}\left(\mathbf{x}\right) = \mathbf{I} \diamond \left(\mathbf{1}\mathbf{x}^{\mathsf{T}}\right) \tag{2.65}$$

Differentiate both sides,

$$d\mathcal{D}\left(\mathbf{x}\right) = \mathbf{I} \diamond \left(\mathbf{1} \, d\mathbf{x}^{\mathsf{T}}\right) \tag{2.66}$$

and vec the result

$$d\text{vec}\,\mathcal{D}\,(\mathbf{x}) = \mathcal{D}\,(\text{vec}\,\mathbf{I})\text{vec}\,\left(\mathbf{1}\,d\mathbf{x}^{\mathsf{T}}\right) \tag{2.67}$$

$$=\mathcal{D}\left(\mathrm{vec}\,\mathbf{I}\right)\left(\mathbf{I}\otimes\mathbf{1}\right)d\mathbf{x}\tag{2.68}$$

The First Identification Theorem gives

$$\frac{d\text{vec}\,\mathcal{D}\,(\mathbf{x})}{d\boldsymbol{\theta}^{\mathsf{T}}} = \mathcal{D}\,(\text{vec}\,\mathbf{I})\,(\mathbf{I}\otimes\mathbf{1})\,\frac{d\mathbf{x}}{d\boldsymbol{\theta}^{\mathsf{T}}}.\tag{2.69}$$

The identity matrix in (2.65) *masks* the matrix **1 x**T , setting to zero all but the diagonal elements. Matrices other than **I** can be used in this way to mask entries of a matrix. For example, the transition matrix for a Leslie matrix, with a vector of survival probabilities **p** on the subdiagonal, is obtained by setting **x** = **p** and replacing **I** with a matrix **Z** that contains ones on the subdiagonal and zeros elsewhere (see, e.g., Chap. 4).

Some Markov chain calculations (Chaps. 5 and 11) involve a matrix **N**dg, which contains the diagonal elements of **N** on the diagonal and zeros elsewhere. This can be written

$$\mathbf{N}\_{\rm dg} = \mathbf{I} \circ \mathbf{N}.\tag{2.70}$$

Differentiating and applying the vec operator yields

$$d\text{vec}\,\mathbf{N}\_{\text{dg}} = \mathcal{D}\,(\text{vec}\,\mathbf{I})d\text{vec}\,\mathbf{N}.\tag{2.71}$$

4. The Kronecker product. Differentiating the Kronecker product is a bit more complicated (Magnus and Neudecker 1985, Theorem 11). We want an expression for the differential of the product in terms of the differentials of the components, something of the form

$$d\text{vec}\,\left(\mathbf{A}\otimes\mathbf{B}\right) = \mathbf{Z}\_1 d\text{vec}\,\mathbf{A} + \mathbf{Z}\_2 d\text{vec}\,\mathbf{B} \tag{2.72}$$

for some matrices **Z**<sup>1</sup> and **Z**2.

This requires a result for the vec of the Kronecker product. Let **A** be of dimension *m* × *p* and **B** be *r* × *s*. Then

$$\text{vec}\,\left(\mathbf{A}\otimes\mathbf{B}\right) = \left(\mathbf{I}\_p\otimes\mathbf{K}\_{s,m}\otimes\mathbf{I}\_r\right)\left(\text{vec}\,\mathbf{A}\otimes\text{vec}\,\mathbf{B}\right).\tag{2.73}$$

Let **Y** = **A** ⊗ **B**. Differentiate,

$$d\mathbf{Y} = (d\mathbf{A} \otimes \mathbf{B}) + (\mathbf{A} \otimes d\mathbf{B}) \tag{2.74}$$

#### 2.8 Some Useful Matrix Calculus Results 25

and vec

$$d\text{vec}\,\mathbf{Y} = \left(\mathbf{I}\_p \otimes \mathbf{K}\_{s,m} \otimes \mathbf{I}\_r\right) \left[ (d\text{vec}\,\mathbf{A} \otimes \text{vec}\,\mathbf{B}) + (\text{vec}\,\mathbf{A} \otimes d\text{vec}\,\mathbf{B}) \right]. \tag{2.75}$$

With some ingenious simplifications (Magnus and Neudecker 1985), this reduces to (2.72) with

$$\mathbf{Z}\_{1} = \left(\mathbf{I}\_{p} \otimes \mathbf{K}\_{s,m} \otimes \mathbf{I}\_{r}\right) \left(\mathbf{I}\_{m} \otimes \text{vec}\,\mathbf{B}\right) \tag{2.76}$$

$$\mathbf{Z}\_2 = \left(\mathbf{I}\_p \otimes \mathbf{K}\_{s,m} \otimes \mathbf{I}\_r\right) \left(\mathbf{vec} \,\mathbf{A} \otimes \mathbf{I}\_{rs}\right). \tag{2.77}$$

Substituting **Z**<sup>1</sup> and **Z**<sup>2</sup> into (2.72) gives the differential of the Kronecker product in terms of the differentials of its component matrices.

5. The matrix inverse. The inverse of **X** satisfies

$$\mathbf{X}\mathbf{X}^{-1} = \mathbf{I}.\tag{2.78}$$

Differentiate both sides

$$(d\mathbf{X})\,\mathbf{X}^{-1} + \mathbf{X}\left(d\mathbf{X}^{-1}\right) = \mathbf{0},\tag{2.79}$$

then vec

$$
\left[ \left( \mathbf{X}^{-1} \right)^{\mathsf{T}} \otimes \mathbf{I} \right] d\mathbf{vec} \,\mathbf{X} + \left[ \mathbf{I} \otimes \mathbf{X} \right] d\mathbf{vec} \,\mathbf{X}^{-1} = \mathbf{0} \tag{2.80}
$$

and finally solve for *d*vec **X**−<sup>1</sup>

$$d\text{vec}\,\mathbf{X}^{-1} = -\left[\mathbf{I}\otimes\mathbf{X}\right]^{-1}\left[\left(\mathbf{X}^{-1}\right)^{\mathsf{T}}\otimes\mathbf{I}\right]d\text{vec}\,\mathbf{X} \tag{2.81}$$

The properties (2.5) and (2.8) of the Kronecker product let this be simplified to

$$d\text{vec}\,\mathbf{X}^{-1} = -\left[\left(\mathbf{X}^{-1}\right)^{\mathsf{T}} \otimes \mathbf{X}^{-1}\right]d\text{vec}\,\mathbf{X} \tag{2.82}$$

6. The square root and ratios. In calculating standard deviations and coefficients of variation it is useful to calculate the elementwise square root and the elementwise ratio of two vectors. If **<sup>x</sup>** is a non-negative vector, and the square root <sup>√</sup>**<sup>x</sup>** is taken elementwise, then

$$d\sqrt{\mathbf{x}} = \frac{1}{2} \mathcal{D} \left(\sqrt{\mathbf{x}}\right)^{-1} d\mathbf{x}.\tag{2.83}$$

For the elementwise ratio, let **x** and **y** be *m* × 1 vectors, with **y** nonzero. Let **w** be a vector whose *<sup>i</sup>*th element is *xi/yi*; i.e., **<sup>w</sup>** <sup>=</sup> <sup>D</sup> *(***y***)*−1**x**. Then

$$d\mathbf{w} = \mathcal{D}\left(\mathbf{y}\right)^{-1} d\mathbf{x} - \left[\mathbf{x}^{\mathsf{T}} \mathcal{D}\left(\mathbf{y}\right)^{-1} \otimes \mathcal{D}\left(\mathbf{y}\right)^{-1}\right] \mathcal{D}\left(\mathbf{vec} \,\mathbf{I}\_m\right) \left(\mathbf{I}\_m \otimes \mathbf{1}\_m\right) d\mathbf{y}.\tag{2.84}$$

This list could go on. The books by Magnus and Neudecker (1988) and Abadir and Magnus (2005) contain many other results, and demographically relevant derivations appear throughout this book, especially in Chap. 5.

#### **2.9 LTRE Decomposition of Demographic Differences**

The LTRE decomposition in Sect. 1.3.1 extends readily to matrix calculus. Suppose that a demographic outcome *ξ* , dimension (*s* × 1), is a function of a vector *θ* of parameters, dimension (*p* × 1). Suppose that results are obtained under two "conditions," with parameters *θ(*1*)* and *θ(*2*)* . Define the parameter difference as *<sup>θ</sup>* <sup>=</sup> *<sup>θ</sup>(*2*)* <sup>−</sup> *<sup>θ</sup>(*1*)* and the effect as *<sup>ξ</sup>* <sup>=</sup> *<sup>ξ</sup> (*2*)* <sup>−</sup> *<sup>ξ</sup> (*1*)* . Then, to first order,

$$
\Delta\mathfrak{E} \approx \sum\_{l=1}^{p} \frac{d\mathfrak{E}}{d\theta\_l} \Delta\theta\_l \tag{2.85}
$$

$$=\frac{d\mathfrak{k}}{d\mathfrak{\theta}^{\mathsf{T}}}\Delta\mathfrak{\theta}.\tag{2.86}$$

Writing

$$
\Delta\theta = \mathcal{D}\,(\Delta\theta)\mathbf{1}\_p,\tag{2.87}
$$

we create a *contribution matrix* **C**, of dimension *s* × *p*,

$$\mathbf{C} = \frac{d\boldsymbol{\xi}}{d\boldsymbol{\theta}^{\mathsf{T}}} \mathcal{D}\,(\boldsymbol{\Delta}\boldsymbol{\theta}).\tag{2.88}$$

The *(i, j )* entry of **C** is the contribution of *θj* to the difference *ξi*, for *i* = 1*,...,s* and *j* = 1*,...,p*. The rows and columns of **C** give

$$\mathbf{C}(i,:) = \text{contribution of } \Delta\theta \text{ to } \Delta\xi\_l \tag{2.89}$$

$$\mathbf{C}(:,j) = \text{contribution of } \theta\_j \text{ to } \Delta\xi \tag{2.90}$$

When calculating **C**, the derivative of *ξ* must be evaluated somewhere. Experience suggests that the evaluating it at the midpoint between *θ(*1*)* and *θ(*2*)* gives good results (Logofet and Lesnaya 1997; Caswell 2001).

#### **2.10 A Protocol for Sensitivity Analysis**

The calculations may grow to be complex, but the protocol is simple:


The rest of this book shows what can be done with this simple procedure.

#### **Bibliography**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Part II Linear Models**

# **Chapter 3 The Sensitivity of Population Growth Rate: Three Approaches**

#### **3.1 Introduction**

The essence of stable population theory is the fact that a population subject to timeinvariant vital rates will (with a few exceptions not of interest here) converge to a stable structure and grow exponentially at a constant rate (the population growth rate, or intrinsic rate of increase). The calculation of the population growth rate from the vital rates is one of the most important accomplishments of formal demography (Sharpe and Lotka 1911).<sup>1</sup> Ecologists recognized early on that, by integrating survival and fertility over the life course, the population growth rate provided a powerful tool for describing the population consequences of environmental conditions (e.g., Birch 1953). For the same reason, evolutionary biologists recognized it as a measure of fitness (Fisher 1930), although that concept requires careful consideration of both demographic and genetic processes (Charlesworth 1994; de Vries and Caswell 2018).

This makes the sensitivity analysis of population growth rate an important problem. It has been approached in three ways. The earliest approach (Hamilton 1966) is specific to age-classified models, and relies on differentiation of the characteristic equation. The second (Caswell 1978) applies to stage-classified as well as age-classified models, and uses eigenvalue perturbation theory. The third is based on matrix calculus and is more flexible than its predecessors.

Chapter 3 is modified, under the terms of a Creative Commons Attribution License, from Caswell, H. 2010. Reproductive value, the stable stage distribution, and the sensitivity of the population growth rate to changes in vital rates. Demographic Research 23:531–548, ©Hal Caswell.

<sup>1</sup>Leonard Euler had obtained the result in 1760, but his derivation rediscovered until 1970 (Keyfitz and Keyfitz 1970).

H. Caswell, *Sensitivity Analysis: Matrix Methods in Demography and Ecology*, Demographic Research Monographs, https://doi.org/10.1007/978-3-030-10534-1\_3

#### **3.2 Hamilton's Equation for Age-Classified Populations**

Consider an age-classified model, in which age *x* is a continuous variable, with mortality rate *μ(x)* and maternity function *m(x)*. The survivorship function is

$$\ell(\mathbf{x}) = \exp\left(-\int\_0^\chi \mu(a) da\right) \tag{3.1}$$

and the population growth rate *r* is the solution to the Euler-Lotka equation

$$1 = \int\_0^\infty e^{-ra} \ell(a) m(a) da. \tag{3.2}$$

The stable age distribution, reproductive value function, birth rate, and generation time (mean age of reproduction in the stable population) are given by

$$c(\mathbf{x}) = \frac{e^{-r\chi}\ell(\mathbf{x})}{\int\_0^\infty e^{-ra}\ell(a)da} \qquad \text{stable age distribution} \tag{3.3}$$

$$w(\mathbf{x}) = \frac{e^{r\mathbf{x}}}{\ell(\mathbf{x})} \int\_{\mathcal{X}}^{\infty} e^{-ra} \ell(a) m(a) da \qquad \text{repoductive value} \tag{3.4}$$

$$b = \left[\int\_0^\infty e^{-ra} \ell(a) da\right]^{-1} \qquad \text{birth rate} \tag{3.5}$$

$$\bar{A} = \int\_0^\infty a e^{-ra} \ell(a) m(a) da \qquad \text{generation time} \tag{3.6}$$

**Sensitivity of** *r* Hamilton (1966) derived the sensitivities of *r* to changes in mortality and fertility at a specified age *x*. His results are equivalent to

$$\frac{dr}{d\mu(\mathbf{x})} = \frac{-c(\mathbf{x})v(\mathbf{x})}{b\bar{A}}\tag{3.7}$$

$$\frac{dr}{dm(\mathbf{x})} = \frac{c(\mathbf{x})}{b\bar{A}}\tag{3.8}$$

That is, the sensitivity of *r* to a change in mortality at age *x* is proportional to the product of the reproductive value at age *x* and the abundance of age *x* in the stable age distribution. The sensitivity of *r* to a change in fertility at age *x* is proportional to the stable age distribution (and the reproductive value at age 0, which equals 1). The proportionality constant in each case is the inverse of the product of the birth rate and the mean age of reproduction.

**Derivation** Hamilton's results are obtained by implicit differentiation of the Euler-Lotka equation (3.2). We will derive Hamilton's original formulation and then show how it reduces to the relation between the stable age distribution and the reproductive value distribution in (3.7) and (3.8).

First, introduce a perturbation parameter *θ* to measure the change in mortality or fertility at the specified age. Writing survival, fertility, and *r* as functions of *θ* gives the Euler-Lotka equation

$$1 = \int\_0^\infty e^{-r(\theta)a} \ell(\theta, a) m(\theta, a) \, da. \tag{3.9}$$

Differentiating both sides of (3.9) with respect to *θ* gives

$$0 = -\frac{dr(\theta)}{d\theta} \int\_0^\infty a e^{-r(\theta)a} \ell(\theta, a) m(\theta, a) \, da$$

$$+ \int\_0^\infty e^{-r(\theta)a} \frac{d\ell(\theta, a)}{d\theta} m(\theta, a) \, da$$

$$+ \int\_0^\infty e^{-r(\theta)a} \ell(\theta, a) \frac{dm(\theta, a)}{d\theta} \, da. \tag{3.10}$$

Solving (3.10) for *dr/dθ* gives

$$\frac{dr(\theta)}{d\theta} = \frac{1}{\bar{A}} \left( \underbrace{\int\_0^\infty e^{-r(\theta)a} \frac{d\ell(\theta, a)}{d\theta} m(\theta, a) \, da}\_{\text{mortality}} + \underbrace{\int\_0^\infty e^{-r(\theta)a} \ell(\theta, a) \frac{dm(\theta, a)}{d\theta} \, da}\_{\text{ferility}} \right) \tag{3.11}$$

Equation (3.11) has two terms, one capturing effects of *θ* on mortality and the other capturing effects on fertility.

#### *3.2.1 Effects of Changes in Mortality*

We want to perturb mortality at one exact age *x* (remember that age and time are continuous), leaving mortality at all other ages unchanged. To do this, we use the unit impulse function, or Dirac delta function. This is a generalized function defined by

$$\delta(\mathbf{x}) = 0 \qquad \mathbf{x} \neq \mathbf{0} \tag{3.12}$$

$$\int\_{-\infty}^{\infty} \delta(s)ds = 1.\tag{3.13}$$

The unit impulse is used in signal processing (e.g., Kamen and Heck 1997, p. 7) to represent the limit of a perturbation of unit strength applied over a shorter and shorter time interval. Think of a normal distribution with mean 0, in the limit as the variance goes to 0, while the area under the curve remains at 1. The most useful properties of the unit impulse, for our application, are

$$\int\_{-\infty}^{\infty} \delta(a - x) f(a) da = f(\mathbf{x}) \tag{3.14}$$

and

$$\int\_{-\infty}^{\chi} \delta(s)ds = H(\chi) \tag{3.15}$$

where *H (x)* is the Heaviside function, or unit step function, which satisfies *H (x)* = 0 for *x <* 0 and *H (x)* = 1 for *x >* 0.

We write mortality as

$$
\mu(\theta, a) = \mu(0, a) + \theta \delta(a - x) \tag{3.16}
$$

where *δ(x)* is the unit impulse function. The sensitivity of *r* to *μ(x)* is obtained as the derivative of *r* with respect to *θ*, evaluated at *θ* = 0,

$$\frac{dr}{d\mu(\mathbf{x})} = \left. \frac{dr}{d\theta} \right|\_{\theta=0}. \tag{3.17}$$

Because only mortality is affected by *θ*

$$\frac{dm(\theta, a)}{d\theta} = 0\tag{3.18}$$

$$\frac{d\mu(\theta, a)}{d\theta} = \delta(a - x). \tag{3.19}$$

From (3.1),

$$\frac{d\ell(\theta, a)}{d\theta} = -e^{-\int\_0^a \mu(\theta, s)ds} \int\_0^a \delta(a - \chi) da \tag{3.20}$$

$$\dot{\theta} = -\ell(\theta, a)H(a - x). \tag{3.21}$$

Substituting into (3.11) and evaluating at *θ* = 0 gives

$$\frac{dr}{d\mu(\mathbf{x})} = \frac{-1}{\bar{A}} \left( \int\_{\chi}^{\infty} e^{-ra} \ell(a) m(a) da \right). \tag{3.22}$$

The integral in (3.22) is close to the reproductive value *v(x)* given by (3.4); specifically,

$$\int\_{\chi}^{\infty} e^{-ra} \ell(a) m(a) da = \ell(\chi) e^{-r\chi} v(\chi). \tag{3.23}$$

However, from (3.3) and (3.5), *(x)e*−*rx* <sup>=</sup> *c(x)/b*. Making these substitutions into (3.22) gives the formal relationship (3.7).

#### *3.2.2 Effects of Changes in Fertility*

Following the same approach, if the perturbation affects fertility at exact age *x*, we write

$$m(\theta, a) = m(0, a) + \theta \delta(a - \mathbf{x}).\tag{3.24}$$

Because only fertility is affected by *θ*, *dμ(θ , a)/dθ* = 0 and *dm(θ , a)/dθ* = *δ(a* − *x)*. Substituting these into (3.11) and evaluating the result at *θ* = 0 gives

$$\frac{dr}{dm(\mathbf{x})} = \frac{1}{\bar{A}} \left( e^{-r\mathbf{x}} \ell(\mathbf{x}) \right). \tag{3.25}$$

From (3.3) and (3.5) it can be seen that the numerator is *c(x)/b*, which leads to the formal relationship (3.8).

#### *3.2.3 History and Perspectives*

Hamilton (1966) obtained the relationship (3.22) in his analysis of the evolution of senescence. From (3.22) and (3.8) it is apparent that (provided *r* ≥ 0) the magnitudes of the sensitivities of *r* to mortality and fertility decline with age. These sensitivities measure the selection gradients on age-specific mortality and fertility. Thus Hamilton concluded that the strength of selection against deleterious mutations would necessarily decline with their age of action, that small positive effects at early ages could easily compensate for much larger negative effects at later ages, and that the evolution of senescence was therefore inevitable.

In the years that followed Hamilton's paper, several other authors developed perturbation analysis for *r*, using related methods. Demetrius (1969) used a discrete age-classified model, and Emlen (1970) used Hamilton's results to derive the dynamics of gene frequencies resulting from the selection gradients on age-specific survival and fertility.

Keyfitz (1971) in a remarkable paper, used implicit differentiation to obtain the sensitivity of population growth rate, life expectancy, birth rates, death rates, and the stable age distribution, apparently independently of Hamilton. He noted the appearance of reproductive value in the sensitivity of *r* to mortality. Goodman (1971) was apparently the first to note that the sensitivities of *r* to mortality and fertility could be expressed in terms of the stable age distribution and reproductive value.

When Hamilton's paper appeared, it was regarded as difficult and esoteric, but it had a great impact. It provided the analytical machinery for examining trade-offs between opposing demographic traits, known as antagonistic pleiotropy (Williams 1957; Rose 1991). It also describes the accumulation of deleterious mutations due to the balance between mutation and selection (e.g., Steinsaltz et al. 2005). These ideas are fundamental to the analysis of human aging (e.g., Rose 1991; Wachter and Finch 1997; Carey and Tuljapurkar 2003; Baudisch 2008) and, more generally, the analysis of life history evolution in humans and other species (e.g., Charlesworth 1994; Stearns 1992).

#### **3.3 Stage-Classified Populations: Eigenvalue Perturbations**

Implicit in Hamilton's analysis is the assumption that the vital rates are functions of age. In many cases, they are not. In humans, characteristics such as education, marital status, health status, or spatial location, may provide important information in addition to age. In other species, the vital rates may depend on developmental stage or body size more than on age. Such populations are described by stageclassified demographic models, of which the age-classified theory is a special case.

Stage-classified demography can be analyzed using matrix population models (Leslie 1945; Caswell 2001). The discrete-time population growth rate *λ* is the dominant eigenvalue of the population projection matrix **A** (guaranteed to be real and positive by the Perron-Frobenius theorem). Let **n***(t)* be the population vector at time *t*, and **A** the population projection matrix, with

$$\mathbf{n}(t+1) = \mathbf{A}\mathbf{n}(t) \tag{3.26}$$

and the population growth rate is given by the dominant eigenvalue *λ* of **A**. The stable stage distribution is given by the corresponding right eigenvector **w** and the reproductive value function by the left eigenvector **v**; they satisfy

$$\mathbf{A}\mathbf{w} = \lambda\mathbf{w} \tag{3.27}$$

$$\mathbf{v}^{\mathsf{T}}\mathbf{A}=\lambda\mathbf{v}^{\mathsf{T}}\tag{3.28}$$

**Sensitivity of** *λ* The effects of perturbations on population growth are approached by looking for the sensitivity of an eigenvalue to changes in the entries of a matrix. We will see that the sensitivity of *λ* to a change in the entry *aij* of **A** is (Caswell 1978)

$$\frac{\partial \lambda}{\partial a\_{lj}} = \frac{v\_l w\_j}{\mathbf{v}^{\mathsf{T}} \mathbf{w}}.\tag{3.29}$$

The entry *aij* measures the per-capita production of stage *i* by stage *j* . Thus the effect of a change in *aij* is proportional to the reproductive value of the destination stage and to the abundance of the origin stage in the stable population. This is a generalization of the relationships (3.7) and (3.8) obtained from Hamilton's analysis.

**Derivation** The eigenvalue *λ* is a solution to the characteristic equation of **A**, which generalizes the Euler-Lotka equation (3.2). Except in special cases, however, the characteristic equation cannot be written down explicitly, making the implicit differentiation approach used by Hamilton impossible. Instead, the relationship (3.29) is obtained by a perturbation expansion. Suppose that **A** is perturbed to **A** + **A**. This will result in perturbations of *λ* and of **w**, which must satisfy

$$(\mathbf{A} + \Delta \mathbf{A}) \left( \mathbf{w} + \Delta \mathbf{w} \right) = (\lambda + \Delta \lambda)(\mathbf{w} + \Delta \mathbf{w}).\tag{3.30}$$

Expanding the products, setting second order terms to zero, and remembering that **Aw** = *λ***w** gives

$$\mathbf{A}(\Delta \mathbf{w}) + (\Delta \mathbf{A})\mathbf{w} = \lambda(\Delta \mathbf{w}) + (\Delta \lambda)\mathbf{w}.\tag{3.31}$$

Multiply on the left by **v**<sup>T</sup> and simplify to obtain

$$(\Delta\lambda)\mathbf{v}^{\mathsf{T}}\mathbf{w} = \mathbf{v}^{\mathsf{T}}(\Delta\mathbf{A})\mathbf{w}.\tag{3.32}$$

If the perturbation affects only one entry, say *aij* , of **A**, then

$$
\Delta\lambda = \frac{v\_l w\_j \left(\Delta a\_{lj}\right)}{\mathbf{v}^\mathsf{T} \mathbf{w}}.\tag{3.33}
$$

Dividing both sides by *aij* and taking the limit as *aij* → 0 gives the sensitivity result (3.29).

#### *3.3.1 Age-Classified Models as a Special Case*

To compare (3.29) with Hamilton's results (3.7) and (3.8), consider an age-classified matrix (a Leslie matrix) with fertilities *Fi* in the first row, survival probabilities *Pi* on the subdiagonal, and zeros elsewhere (Leslie 1945; Keyfitz 1968). In this case (3.29) simplifies to

$$\frac{\partial \lambda}{\partial P\_l} = \frac{v\_{l+1} w\_l}{\mathbf{v}^{\mathsf{T}} \mathbf{w}} \tag{3.34}$$

$$\frac{\partial \lambda}{\partial F\_l} = \frac{v\_l w\_l}{\mathbf{v}^{\mathsf{T}} \mathbf{w}}. \tag{3.35}$$

Equation (3.34) corresponds to (3.7); the sensitivity is proportional to the product of the reproductive value and the stable stage distribution. Equation (3.35) corresponds to (3.8), and shows why reproductive value is apparently missing from (3.8): reproductive value at birth [*v(*0*)* in Hamilton's notation] is scaled to equal 1.

#### *3.3.2 Sensitivity to Lower-Level Demographic Parameters*

The entries of **A** are often functions of other, lower-level parameters. The sensitivity of *λ* to these parameters is obtained by the chain rule. For example, suppose that stage 1 may contribute individuals to stages 2 or 3 (Fig. 3.1). Write the transition probabilities as

$$a\_{2\parallel} = \chi \sigma \tag{3.36}$$

$$a\_{\rm 3l} = (1 - \chi)\sigma \tag{3.37}$$

where *σ* is the survival probability and *γ* the probability that the individual moves to stage 2, conditional on survival. Then the sensitivities of *λ* to *γ* and to *σ* are given by

$$\frac{d\lambda}{d\sigma} = \frac{\partial\lambda}{\partial a\_{21}}\frac{da\_{21}}{d\sigma} + \frac{\partial\lambda}{\partial a\_{31}}\frac{da\_{31}}{d\sigma} \tag{3.38}$$

$$=\frac{w\_1\left[\boldsymbol{\chi}\,\boldsymbol{v}\_2 + (1-\boldsymbol{\chi})\boldsymbol{v}\_3\right]}{\mathbf{v}^\mathsf{T}\mathbf{w}}\tag{3.39}$$

**Fig. 3.1** An example of lower-level parameters appearing in a portion of a life cycle. Individuals in stage 1 survive with probability *σ*, and, conditional on survival, move to stage 2 with probability *γ* and to stage 3 with probability 1 − *γ*

#### 3.4 Growth Rate Sensitivity via Matrix Calculus 39

$$\frac{d\lambda}{d\chi} = \frac{\partial\lambda}{\partial a\_{21}}\frac{da\_{21}}{d\chi} + \frac{\partial\lambda}{\partial a\_{31}}\frac{da\_{31}}{d\chi} \tag{3.40}$$

$$=\frac{\sigma w\_1 \left(v\_2 - v\_3\right)}{\mathbf{v}^{\mathsf{T}}\mathbf{w}}.\tag{3.41}$$

The sensitivity to survival is proportional to the weighted average of the reproductive values of the destination stages, and the sensitivity to the transition probability *γ* is proportional to the difference in reproductive value between the destination stages.

#### *3.3.3 History*

I first encountered the basis for this perturbation expansion in a paper by C.A. Desoer in the proceedings of an engineering conference (Desoer 1967).<sup>2</sup> Eigenvalue perturbations were of particular interest to engineers in the 1960s as part of a shift from frequency-domain methods to state-space methods in the study of linear systems (Zadeh and Desoer 1963). However, the result dates back to Jacobi (1846), and has been independently rediscovered many times (e.g., Faddeev 1959; Papoulis 1966; Franklin 1968). In population biology, this perturbation approach has been extended to many other sensitivity problems, including the sensitivity of subdominant eigenvalues and transient behavior, of growth rates in periodic and stochastic environments, of the eigenvectors, and of the spreading speed in biological or demographic invasions (see Caswell (2001) for reviews and references).

#### **3.4 Growth Rate Sensitivity via Matrix Calculus**

Matrix calculus provides a still more general approach to the sensitivity analysis of the population growth rate. Equation (3.29) perturbs only a single entry of **A**; derivatives with respect to other parameters are assembled by summing their effects over all the entries of **A**, as in (3.41). Using matrix calculus, we now consider *λ* as a scalar function of **A** and **A** as a matrix-valued function of a parameter vector *θ*.

**Sensitivity of** *λ* We will show that the derivative of *λ* with respect to *θ* is

$$\frac{d\lambda}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left(\frac{\mathbf{w}^{\mathsf{T}} \otimes \mathbf{v}^{\mathsf{T}}}{\mathbf{v}^{\mathsf{T}}\mathbf{w}}\right) \left(\frac{d\mathbf{v}\mathbf{c}\,\mathbf{A}}{d\boldsymbol{\theta}^{\mathsf{T}}}\right),\tag{3.42}$$

<sup>2</sup>By a fortunate accident; I was searching for something completely different. We may wonder whether the chances of such coincidences are higher or lower in the internet search era.

where ⊗ denotes the Kronecker product. If *θ* is a *p* × 1 vector of parameters, then *dλ/dθ*<sup>T</sup> is a 1 <sup>×</sup> *<sup>p</sup>* matrix whose *<sup>i</sup>*th entry is *dλ/dθi*.

**Derivation** Following the steps in Chap. 2, begin by taking the differential of both sides of (3.27) to give

$$(d\mathbf{A})\mathbf{w} + \mathbf{A}(d\mathbf{w}) = (d\lambda)\mathbf{w} + \lambda(d\mathbf{w}).\tag{3.43}$$

Multiply both sides on the left by **v**<sup>T</sup> and simplify to obtain

$$(d\lambda)\mathbf{v}^{\mathsf{T}}\mathbf{w} = \mathbf{v}^{\mathsf{T}}(d\mathbf{A})\mathbf{w} \tag{3.44}$$

Next, apply the vec operator to both sides of (3.44). Since the left side is a scalar, the vec operator has no effect. The right side is a product of three quantities, so Roth's theorem implies that

$$(d\lambda)\mathbf{v}^{\mathsf{T}}\mathbf{w} = \left(\mathbf{w}^{\mathsf{T}}\otimes\mathbf{v}^{\mathsf{T}}\right)d\mathsf{vec}\mathbf{c}\,\mathbf{A}.\tag{3.45}$$

The First Identification Theorem then gives

$$\frac{d\lambda}{d\text{vec}^{\mathsf{T}}\mathbf{A}} = \frac{\mathbf{w}^{\mathsf{T}} \otimes \mathbf{v}^{\mathsf{T}}}{\mathbf{v}^{\mathsf{T}}\mathbf{w}}.\tag{3.46}$$

Finally, the chain rule (2.18) gives us

$$\frac{d\lambda}{d\theta^{\mathsf{T}}} = \frac{d\lambda}{d\text{vec}^{\mathsf{T}}\mathbf{A}} \frac{d\text{vec}\,\mathbf{A}}{d\theta^{\mathsf{T}}}.\tag{3.47}$$

The matrix calculus approach is particularly powerful because of the flexibility in specifying the effect of *θ* on the vital rates. Suppose that **A** depends on a vector *σ* of survival probabilities, which are a function of the concentration *X* of a pollutant, which in turn is changing as a function of time *t*. The rate of change of *λ* over time is

$$\frac{d\lambda}{dt} = \left(\frac{d\lambda}{d\text{vec}^{\mathsf{T}}\mathbf{A}}\right) \left(\frac{d\text{vec}\,\mathbf{A}}{d\sigma^{\mathsf{T}}}\right) \left(\frac{d\sigma}{dX}\right) \left(\frac{dX}{dt}\right) \tag{3.48}$$

Each of the terms in (3.48) can be evaluated separately; the matrix product gives the correct dimension for the final sensitivity result (a 1 × 1 scalar in this case).

#### **3.5 Second Derivatives of Population Growth Rate**

The second derivatives of *λ* measure the curvature of the response to changes in parameters. They have important applications in evolutionary demography, where they indicate the action of stabilizing, disruptive, or correlational selection on fitness-related traits (e.g., Phillips and Arnold 1989; Caswell 2001), in adaptive dynamics, where they help determine the stability of evolutionary singular strategies (e.g., Diekmann 2004), and in extending sensitivity analysis to second-order effects.

Since the first derivatives of *λ* are written, in Eqs. (3.29) and (3.46), in terms of the right and left eigenvectors of **A**, the second derivatives of *λ* require the first derivatives of those eigenvectors. Caswell (1996) derived the second derivatives of *λ* to entries of **A** by an extension of the method in Sect. 3.3. However, a more general and rigorous method is available using matrix calculus.

Consider a (scalar) variable *ξ* which is a function of a vector *θ* of parameters. The complete set of second derivatives of *ξ* are given by the Hessian matrix

$$\mathbf{H} = \left(\frac{\partial^2 \xi}{\partial \theta\_l \partial \theta\_j}\right) \tag{3.49}$$

Magnus and Neudecker (1988) proved (their Second Identification Theorem) that if the second differential of *ξ* can be written as

$$d^2\xi = d\theta^\mathsf{T} \mathbf{B} d\theta \tag{3.50}$$

for some matrix **B**, then

$$\mathbf{H} = \frac{1}{2} \left( \mathbf{B} + \mathbf{B}^{\mathsf{T}} \right). \tag{3.51}$$

Shyu and Caswell (2014) used this approach to derive the second derivatives of the population growth rate *λ*, the continuous-time population growth rate *r* = log *λ*, and the net reproductive rate *R*0, to changes in either the entries of **A** or to arbitrary lower-level parameters of which **A** is a function. We will not explore second derivatives in this book, but Shyu's other work (Shyu and Caswell 2016a,b) applies them to analyze the evolutionary demography of sex ratios, and Caswell and Shyu (2017) use them to analyze the effects of mortality on the selection gradients on senescence.

#### **3.6 Conclusion**

Each of the three approaches to growth rate sensitivity, leading to Eqs. (3.7), (3.8), (3.29), and (3.42), uses its own analytical methods. They agree, however, in showing how the sensitivity of population growth rate can be written in terms of the stable stage distribution and the reproductive value. In general, the effect of a change in the rate at which individuals move from stage *j* to stage *i* is proportional to the abundance of the origin stage (*j* ) and the reproductive value of the destination stage (*i*). If a transition yields individuals with low reproductive value, or if few individuals are available to experience the change in the rate of transition, the effect on population growth will be small.

#### **Bibliography**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 4 Sensitivity Analysis of Longevity and Life Disparity**

#### **4.1 Introduction**

The population growth rate (*λ* or *r*) analyzed in Chap. 3 is a population-level consequence of the individual-level vital rates. A similarly basic outcome, at the individual or cohort level, is longevity: the length of individual life. The most commonly encountered description of longevity is its expectation, the life expectancy. However, longevity is a random variable, differing among individuals (even when those individuals are subject to the same rates and hazards) because of the random vagaries of mortality and survival. Therefore, it is important to also consider its variance and higher moments. This chapter introduces the sensitivity analysis of longevity, which will be explored in more detail in Chaps. 5, 11, and 12.

As in Chap. 3, we will begin by reviewing a classic formula for the sensitivity of life expectancy in age-classified models. The we will use matrix calculus to derive more general formulas for the moments of longevity, the distribution of age or stage at death, and the life disparity, applicable to age- or stage-classified populations.

#### **4.2 Life Expectancy in Age-Classified Populations**

**Notation** It is customary to denote life expectancy by symbols like *e*<sup>o</sup> *<sup>x</sup>* or *e(x)*, but in general the symbol *e* plays too many roles in mathematics to be helpful for our purposes. So, when we make the transition to matrix formulations, I will use the symbol *η*, in various vector and scalar manifestations, to indicate longevity.

Perturbation analysis of longevity has been pursued mostly within the framework of age-classified life cycles (e.g., Canudas Romo 2003; Keyfitz 1971; Pollard 1982; Vaupel 1986; Vaupel and Canudas Romo 2003). The life expectancy at age *x* is given by

$$e(\mathbf{x}) = \frac{1}{\ell(\mathbf{x})} \int\_{\mathbf{x}}^{\infty} \ell(\mathbf{s}) d\mathbf{s} \tag{4.1}$$

where the survivorship function *(x)* is the probability of survival to age *x*.

The classical result for the sensitivity of life expectancy at birth to a change in mortality at age *a* is

$$\frac{de(0)}{d\mu(a)} = -\ell(a)e(a). \tag{4.2}$$

That is, the sensitivity of life expectancy at birth to a change in mortality at age *a* is equal to the product of the probability of survival to age *a* and the life expectancy at age *a*. In other words, *e(*0*)* is most sensitive to changes in mortality at ages to which lots of individuals survive (to experience the change in mortality) and beyond which there is lots of longevity remaining (so they can enjoy the change in mortality). The derivative is negative because increasing mortality reduces life expectancy.

The result was presented independently by Keyfitz (1971) who also referenced some earlier approaches (Wilson 1938; Irwin 1949) and by Pollard (1982). Keyfitz's derivation was sketchy, and Pollard simply stated that the result was well-known, and gave no derivation. From a general sensitivity analysis perspective, we can derive the result using the same approach applied in Chap. 3 to population growth rate.

#### *4.2.1 Derivation*

Differentiating (4.1) with respect to mortality at some specified age *a* gives

$$\frac{de(0)}{d\mu(a)} = \int\_0^\infty \frac{d\ell(s)}{d\mu(a)} ds\tag{4.3}$$

and our problem reduces to finding the derivative of *(s)* with respect to *μ(a)*. To do so, introduce a parameter *θ* to measure the size of the perturbation at age *a*, and write mortality as

$$
\mu(\mathbf{x}, \theta) = \mu(\mathbf{x}, 0) + \theta \,\,\delta(\mathbf{x} - a) \tag{4.4}
$$

where *δ(x* <sup>−</sup> *a)* is the Dirac delta function.<sup>1</sup> The derivative with respect to *μ(a)* is obtained by differentiating with respect to *θ* and evaluating the result at *θ* = 0.

<sup>1</sup>See Chap. 3 for a description of this generalized function.

#### 4.3 A Markov Chain Model for the Life Cycle 47

Write survivorship as

$$\ell(\mathbf{x}, \theta) = \exp\left[-\int\_0^\mathbf{x} \mu(z, \theta) dz\right] \tag{4.5}$$

so that

$$\frac{d\ell(\mathbf{x},\theta)}{d\theta} = -\ell(\mathbf{x},\theta) \int\_0^\chi \frac{d\mu(\mathbf{z},\theta)}{d\theta} d\mathbf{z} \tag{4.6}$$

From (4.4) we have

$$\frac{d\mu(z,\theta)}{d\theta} = \delta(z-a) \tag{4.7}$$

so that

$$\frac{d\ell(\mathbf{x},\theta)}{d\theta} = -\ell(\mathbf{x},\theta) \int\_0^\infty \delta(z-a)dz\tag{4.8}$$

$$=-\ell(\mathbf{x}, \theta)H(\mathbf{x} - a)\tag{4.9}$$

where *H (*·*)* is the unit step function. Substituting this into (4.3) and evaluating at *θ* = 0 gives

$$\frac{de(0)}{d\mu(a)} = -\int\_0^\infty \ell(s)H(s-a)ds\tag{4.10}$$

$$=-\int\_{a}^{\infty} \ell(s)ds\tag{4.11}$$

which, by (4.1) is equal to (4.2).

#### **4.3 A Markov Chain Model for the Life Cycle**

Age has a special status in demography because it is continuous, linear, and permits movement in only one direction and at one rate (age increases by one unit for every unit of time). All other demographic characteristics have the potential for much greater flexibility, and the operators that describe movement and development of individuals require an equal degree of flexibility. This book is devoted to matrix formulations of these problems, which have the great advantage of permitting both age and stage-classified models. The basic formulation, as far as longevity is concerned, is that of a finite-state absorbing Markov chain.

#### *4.3.1 A Markov Chain Formulation of the Life Cycle*

We describe the life cycle as an absorbing Markov chain. This approach was pioneered in demography by Feichtinger (1971) and Hoem (1969), and has been greatly extended in recent years (Caswell 2001, 2006, 2009; Horvitz and Tuljapurkar 2008; Tuljapurkar and Horvitz 2006; Steinsaltz and Evans 2004). Good sources for the basic theory of absorbing Markov chains are Kemeny and Snell (1976) and Iosifescu (1980).

These models will be explored in more detail in Chaps. 5 and 11. The sensitivity analysis of measures of variance in longevity has been developed by Van Raalte and Caswell (2013) and Engelman et al. (2014). An important extension of Markov chain models for longevity is the incorporation of "rewards" to represent the value, in some sense, of the length of life, extending methods developed for dynamic programming (Howard 1960). The rewards include the production of offspring (Caswell 2011; van Daalen and Caswell 2015, 2017), the accumulation of income and expenditures (Caswell and Kluge 2015) and healthy longevity (Caswell and Zarulli 2018). The sensitivity analysis of these important models is derived in van Daalen and Caswell (2017).

Markov chain theory distinguishes between *recurrent* and *transient* states. A recurrent state has the property that the probability of returning to that state at least once is 1. A transient state is one for which that probability is less than 1. If a Markov chain contains transient states, it will eventually leave those states and arrive in a recurrent state or class of states, where it will remain permanently. Such a chain is called *absorbing*. Absorbing chains are the basic model for the demography of individuals because life is inherently transient. Any individual will, with probability one, eventually leave the set of living states and be absorbed by death.

If a Markov chain consists of a single set of recurrent states that all communicate with each other, it is said to be ergodic. The transition matrix for an ergodic chain is irreducible and primitive. Ergodic Markov chains play a limited role in demographic contexts because they cannot include mortality. Chapter 11 will, however, present the sensitivity analysis of these models.

In demographic models, individuals move among a set of transient (i.e., living) states in their life cycle before they eventually reach an absorbing state (death). Transient states may represent age classes, developmental or life history stages, or states defined by health, employment, economic, or other kinds of status. In studying longevity, we are particularly interested in absorbing states representing death, or perhaps death classified by age or stage at death, or by cause of death. The analysis applies equally to other ways of leaving the life cycle (e.g., graduation in a model of educational states, discharge from treatment in model of health states).

Number the stages in the life cycle so that the transient states are 1*,...,s* and the absorbing states are *s* + 1*,...,s* + *a*. Then the transition matrix of the Markov chain is

$$\mathbf{P} = \left(\frac{\mathbf{U} \, \vert 0}{\mathbf{M} \, \vert \mathbf{I}}\right) \tag{4.12}$$

Here, **U** is the *s* × *s* matrix of transition probabilities among the transient states. The *a* × *s* matrix **M** gives the probabilities of absorption in each of the absorbing states. The columns of **P** sum to one. I assume that the spectral radius (the dominant eigenvalue) of **U** is strictly less than one; a sufficient condition for this is that there is a non-zero probability of ultimate death for every stage.

Age-classified models are a special case with survival probabilities on the subdiagonal (and possibly in the last diagonal entry); e.g., for *s* = 3 in which

$$\mathbf{U} = \begin{pmatrix} 0 & 0 & 0 \\ p\_1 & 0 & 0 \\ 0 & p\_2 & p\_3 \end{pmatrix} \tag{4.13}$$

The age-specific survival probability is *pi* <sup>=</sup> *<sup>e</sup>*−*μi* , with *μi* a mortality rate applying to age class *i*. The *(s, s)* entry of **U** is an age-independent survival probability for a final open-ended age class, with a remaining life expectancy of 1*/(*1−*ps)*. If *ps* = 0 no one survives beyond age class *s*. When the age-classified model is constructed from a life table, *pi* = 1−*qi*−1; that is, the survival of age-class 1 is the complement of the probability of death between age 0 and 1.

The mortality matrix **M** gives the probabilities of transition from each of the transient states to each of the absorbing states. Figure 4.1 shows some examples of life cycle formulations that can arise, including both age and stage classification in the transient states, and absorbing states classified by age at death, grouped ages at death, stage at death, or cause of death. The resulting mortality matrices are

$$\text{Figure 4.1a} \qquad \mathbf{M} = \begin{pmatrix} 1 \ -P\_1 \ 1 \ -P\_2 \ 1 \ -P\_3 \ 1 \end{pmatrix} \tag{4.14}$$

$$\text{Figure 4.1b} \qquad \mathbf{M} = \begin{pmatrix} 1 - P\_1 & 0 & 0 & 0 \\ 0 & 1 - P\_2 & 0 & 0 \\ 0 & 0 & 1 - P\_3 & 0 \\ 0 & 0 & 0 & 1 - P\_4 \end{pmatrix} \tag{4.15}$$

$$\text{Figure 4.1c} \qquad \mathbf{M} = \begin{pmatrix} 1 - P\_1 \ 1 - P\_2 & 0 & 0 \\ 0 & 0 & 1 - P\_3 \ 1 - P\_4 \end{pmatrix} \tag{4.16}$$

$$\begin{array}{ll} \text{Figure 4.1d} & \mathbf{M} = \begin{pmatrix} q\_1 & 0 & 0 & 0 \\ 0 & q\_2 & 0 & 0 \\ 0 & 0 & q\_3 & 0 \\ 0 & 0 & 0 & q\_4 \end{pmatrix} \end{array} \tag{4.17}$$

**Fig. 4.1** Life cycle graphs showing some alternative choices for structure of the absorbing state: death, age at death, stage at death, or cause of death. (**a**) Age-classified with one dead state. (**b**) Age-classified, age at death. (**c**) Age-classified, grouped ages at death. (**d**) Stage-classified, stage at death. (**e**) Age-classified, causes of death

$$\text{Figure 4.1e} \qquad \mathbf{M} = \begin{pmatrix} q\_1 \ q\_2 \ q\_3 \ q\_4\\ s\_1 \ s\_2 \ s\_3 \ s\_4 \end{pmatrix} \tag{4.18}$$

The beauty of formulating longevity as a Markov chain is that many statistics of longevity can be written in terms of the matrices **U** and **M** and sensitivity analysis can be carried out using matrix calculus.

#### *4.3.2 Occupancy Times*

Consider an individual in transient state *j* . Eventual absortion is certain. But before that, the individual will occupy various transient states. The number of such visits, the occupancy time2 is the basic unit of longevity. Occupancy is particularly central in studies of health demography, where it quantifies the parts of a life spent in different health states. But, even without the added dimension of something like health, occupancy of transient states is the basis of longevity analysis.

Let *νij* be the number of visits to transient state *i* by an individual in transient state *j* , prior to absorption. Its expectation is given by the fundamental matrix (e.g., Kemeny and Snell 1976; Iosifescu 1980)

$$\mathbf{N} = \left( E(\boldsymbol{\nu}\_{lj}) \right) \tag{4.19}$$

$$= \left(\mathbf{I} - \mathbf{U}\right)^{-1} \tag{4.20}$$

More details, and examples, for the higher moments and variances of occupancy times are given in Chaps. 5 and 11.

#### *4.3.3 Longevity*

The longevity of an individual in state *j* can be equated to the total occupancy time of all transient states by that individual, prior to eventual absorption. Let *ηj* be this longevity; the expectation of *ηj* is the sum of the elements in column *j* of **N**. We define *η*<sup>1</sup> and *η*<sup>2</sup> as the vectors containing the first and second moments of longevity, respectively. Then

$$E(\boldsymbol{\eta})^{\mathsf{T}} = \boldsymbol{\eta}\_{\mathsf{l}}^{\mathsf{T}} = \mathbf{1}^{\mathsf{T}} \mathbf{N} \tag{4.21}$$

Figure 4.2a shows the life expectancy for India in 1961 and Japan in 2006.

The vector of the second moments of longevity satisfies

$$
\boldsymbol{\eta}\_2^\mathsf{T} = \boldsymbol{\eta}\_1^\mathsf{T} \left(2\mathbf{N} - \mathbf{I}\right) \tag{4.22}
$$

(Iosifescu 1980). The variance and standard deviation of longevity are thus

$$V\left(\eta\right)^{\mathsf{T}} = \eta\_2 - \eta\_1 \circ \eta\_1\tag{4.23}$$

$$SD(\mathfrak{q}) = \sqrt{V(\mathfrak{q})} \tag{4.24}$$

where the square root is taken element-wise.

<sup>2</sup>Because time is discrete here, the number of visits is equal to the number of time increments, which is the amount of time spent in the state. In continuous-time models, the number of visits to, and the length of time spent in, a transient state are different. The corresponding calculations for continuous-time models are given in Chap. 12.

**Fig. 4.2** Calculations for longevity of India (1961) and Japan (2006). (**a**) Remaining life expectancy as a function of age. (**b**) Standard deviation of remaining longevity as a function of age. Vertical line at age 10 indicates *SD*10, sometimes used as a measure of lifespan disparity. (**c**) Sensitivity of life expectancy at birth to changes mortality at each age. (**d**) Sensitivity of variance in longevity at birth to changes in mortality at each age. (**e**) Sensitivity of life disparity *η*† to changes in mortality at each age

Note that *V (η)* and *SD(η)* are vectors; their elements give the variance or standard deviation of longevity for individuals in each stage, making it easy to examine variation in remaining longevity conditional on the starting age. This conditioning can be important; Edwards and Tuljapurkar (2005) have made a strong case that *SD(η*10*)*, starting from age 10, is a good index to prevent infant and child mortality from obscuring patterns in old age longevity.

Figure 4.2b shows *SD(η)* for India and Japan. The standard deviation at birth, *SD(η*1*)* is roughly twice as great in India as in Japan, a discrepancy that remains at *SD(η*10*)*. Eventually, beyond the age of 50, *SD(η)* becomes greater in India than in Japan.

#### *4.3.4 Age or Stage at Death*

If the model contains more than one absorbing state (as in all the cases but the first in Fig. 4.1), the eventual fate of an individual is uncertain. The probability distributions of the eventual absorbing state are given by the columns of the matrix

$$\mathbf{B} = \mathbf{M} \mathbf{N} \tag{4.25}$$

where *bij* is the probability of eventual absorption in absorbing state *i* for an individual starting in transient state *j* (Iosifescu 1980).

Suppose that the absorbing stages are defined as the age (or stage) at death, as in Fig. 4.1b, d. Then **M** is given by Eq. (4.17) and the *j* th column of **B** is the probability distribution of age at death for an individual starting in age class *j* :

$$
\boldsymbol{\Psi}\_{j} = \mathbf{B}(:, \, j) = \mathbf{B} \mathbf{e}\_{j}. \tag{4.26}
$$

#### *4.3.5 Life Lost and Life Disparity*

When an individual dies, it loses the remaining life that it would have experienced, had it not died. This counterfactual proposition seems abstract, but we can make it concrete by asking for the expectation of that lost lifetime. An individual that dies at age *x* will lose, on average, an amount of life given by the life expectancy at age *x*. Averaging this remaining life expectancy over the distribution of age at death gives the mean life lost due to mortality. Vaupel and Canudas Romo (2003) denoted the life lost by *e*†. Here we define the vector *η*†, whose *i*th entry is the expected life lost due to mortality by an individual starting in age class *i*; it is given by

$$\left(\boldsymbol{\eta}^{\dagger}\right)^{\mathsf{T}} = \boldsymbol{\eta}\_{1}^{\mathsf{T}} \mathbf{B}.\tag{4.27}$$

Calculations of life lost from mortality due to specific causes of death play a central role in the calculations of disability-adjusted life years (DALYs) used in calculations of the burden of diseases (e.g., Devleesschauwer et al. 2014; GBD 2016 DALYs and HALE Collaborators 2017). See Caswell and Zarulli (2018) for the relationship between DALY calculations and Markov chain methods, and for a calculation of the variance in life lost.

The life lost *η*† has an additional interpretation as a measure of disparity. Consider a population in which everyone dies at the same age. In such a situation, *<sup>η</sup>*† <sup>=</sup> 0, because at the age of death, there is no additional life expectancy. Thus *<sup>η</sup>*† is a measure of "life disparity;" the larger its value, the more disparity there is among individuals in age at death (Vaupel et al. 2011).

The values of life disparity in age class 1, for Japan and India, in years, are

$$\eta\_1^\dagger = \begin{cases} 10.1 & \text{Japan} \\ 23.9 & \text{India} \end{cases} \tag{4.28}$$

Just as India has a much larger variance in longevity than Japan, it also has a higher life disparity.

#### **4.4 Sensitivity Analysis**

Our goal is to obtain expressions for the derivatives of *E(η)*, *V (η)*, *SD(η)*, **B**, and *η*†, with respect to changes in age specific-mortality rates. The calculations and some results (contrasting the mortality schedules of Japan and India) are given here. More details are presented in Chaps. 5 and 11. Results are presented in terms of an arbitrary vector *θ* of parameters on which **U** and **M** depend. In the examples, *θ* will be the vector *μ* of age-specific mortality rates.

#### *4.4.1 Sensitivity of the Fundamental Matrix*

The fundamental matrix **N** appears in many of these formulas. Its sensitivity was first obtained by Caswell (2006). Suppose that **U** is a function of some vector *θ* of parameters. Then

$$\frac{d\mathbf{vec}\,\mathbf{N}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left(\mathbf{N}^{\mathsf{T}} \otimes \mathbf{N}\right) \frac{d\mathbf{vec}\,\mathbf{U}}{d\boldsymbol{\theta}^{\mathsf{T}}}\tag{4.29}$$

(see Chap. 5).

#### *4.4.2 Sensitivity of Life Expectancy*

The sensitivity of the vector of life expectancy as a function of age is obtained by differentiating (4.21),

$$d\eta\_1^\mathsf{T} = \mathbf{1}^\mathsf{T}(d\mathbf{N})\tag{4.30}$$

Applying the vec operator and Roth's theorem (2.13) gives

$$d\eta\_1 = \left(\mathbf{I} \otimes \mathbf{1}^\mathsf{T}\right) d\mathbf{vec} \,\mathbf{N} \tag{4.31}$$

$$= \left(\mathbf{I} \otimes \mathbf{1}^{\mathsf{T}}\right) \left(\mathbf{N}^{\mathsf{T}} \otimes \mathbf{N}\right) d\mathbf{vec} \,\mathbf{U} \tag{4.32}$$

$$= \left(\mathbf{N}^{\mathsf{T}} \otimes \boldsymbol{\eta}\_{\mathsf{l}}^{\mathsf{T}}\right) d\mathbf{vec} \,\mathbf{U}.\tag{4.33}$$

The last step uses the fact that *(***A** ⊗ **B***)(***C** ⊗ **D***)* = *(***AC** ⊗ **BD***)*. Applying the chain rule and the first identification theorem gives the result

$$\frac{d\boldsymbol{\eta}\_{\rm l}}{d\boldsymbol{\theta}^{\sf T}} = \left(\mathbf{N}^{\sf T} \otimes \boldsymbol{\eta}\_{\rm l}^{\sf T}\right) \frac{d\mathbf{vec} \,\mathbf{U}}{d\boldsymbol{\theta}^{\sf T}}\tag{4.34}$$

**Sensitivity to mortality** If interest focuses on changes in age-specific mortality, so that *θ* = *μ*, then the sensitivity formula expands, using the chain rule, to

$$\frac{d\boldsymbol{\eta}\_{\rm l}}{d\boldsymbol{\mu}^{\rm T}} = \left(\mathbf{N}^{\rm T} \otimes \boldsymbol{\eta}\_{\rm l}^{\rm T}\right) \frac{d\text{vec}\,\mathbf{U}}{d\boldsymbol{\mu}^{\rm T}}\tag{4.35}$$

This can be evaluated in several ways, depending on how the matrix **U** is written as a function of mortality. One approach is used in Sect. 4.4.3, and a somewhat more widely useful approach in Sect. 4.4.4.

The results for Japan and India are shown in Fig. 4.2. Life expectancy is more sensitive to changes in mortality in Japan than in India; the (absolute value of) sensitivity decreases almost linearly with age in Japan, and slightly less linearly in India (Fig. 4.2). On the other hand, life expectancy is more elastic to changes in mortality in India, and less so in Japan.

#### *4.4.3 Generalizing the Keyfitz-Pollard Formula*

The Keyfitz-Pollard formula for the sensitivity of life expectancy to changes in mortality rate, given in Eq. (4.2), has a clear interpretation: the sensitivity to mortality at age *a* depends on the probability of survival to age *a* and the remaining life expectancy at age *a*. We are now in a position to generalize this to stageclassified matrix models.

First, we derive the matrix version of the Keyfitz-Pollard result, for the sensitivity of life expectancy of age class 1, which is

$$dE\,\left(\eta\_{\text{l}}\right) = \left(\mathbf{e}\_{\text{l}}^{\mathsf{T}} \otimes \mathbf{1}^{\mathsf{T}}\right)d\mathbf{vec}\,\mathbf{N}\tag{4.36}$$

$$= \left(\mathbf{e}\_{l}^{\mathsf{T}} \otimes \mathbf{1}^{\mathsf{T}}\right) \left(\mathbf{N}^{\mathsf{T}} \otimes \mathbf{N}\right) d\mathbf{vec} \,\mathbf{U} \tag{4.37}$$

Consider a population with *s* age classes and let *μi* be the mortality rate and *pi* = exp*(*−*μi)* the survival probability for age class *i*. The matrix **U** is given by (4.13), which can be written

$$\mathbf{U} = \sum\_{k=1}^{s-1} \left( \mathbf{e}\_{k+1} \mathbf{e}\_k^\mathsf{T} \right) \ p\_k \tag{4.38}$$

where **e***<sup>k</sup>* is the unit vector, of length *s*, with a 1 in the *k*th position and zeros elsewhere. Differentiating **U** and applying the vec operator gives

$$d\text{vec}\,\mathbf{U} = -\sum\_{k=1}^{s-1} (\mathbf{e}\_k \otimes \mathbf{e}\_{k+1})\,\,\, p\_k\,\, (d\mu\_k)\tag{4.39}$$

Substitute (4.39) into (4.37) and consider a perturbation of mortality at age *a*; the result is

$$\frac{dE(\eta\_{1})}{d\mu\_{a}} = -\left(\mathbf{e}\_{1}^{\mathsf{T}} \otimes \mathbf{1}^{\mathsf{T}}\right) \left(\mathbf{N}^{\mathsf{T}} \otimes \mathbf{N}\right) (\mathbf{e}\_{a} \otimes \mathbf{e}\_{a+1}) \cdot p\_{a}.\tag{4.40}$$

This simplifies to

$$\frac{dE(\eta\_1)}{d\mu\_a} = -\left(\mathbf{e}\_1^\mathsf{T}\mathbf{N}^\mathsf{T}\mathbf{e}\_a \otimes \mathbf{1}^\mathsf{T}\mathbf{N}\mathbf{e}\_{a+1}\right)p\_a\tag{4.41}$$

$$=-\underbrace{E\left(\upsilon\_{a}\right)p\_{a}}\_{\text{survival}}\underbrace{E\left(\eta\_{a+1}\right)}\_{\text{expactancy}}\qquad\text{age-classified}\tag{4.42}$$

In an age-classified model, *νa* is either 0 or 1 (you cannot occupy a year of age for more than 1 year); hence the *E (νa)* is the probability of survival to age *a*. Thus we have a matrix version of the Keyfitz-Pollard result: the sensitivity of life expectancy is the probability of survival to age *a* times the probability of survival from *a* to *a* + 1, times the life expectancy at age *a* + 1.

Now apply the same approach to a stage-classified model, in which **U** can be written as the product of a diagonal matrix with survival probabilities on the diagonal, and a stochastic matrix **G** giving the transition probabilities conditional on survival:

#### 4.4 Sensitivity Analysis 57

$$\mathbf{U} = \mathbf{G}\boldsymbol{\Sigma}\tag{4.43}$$

$$\mathbf{f} = \mathbf{G} \begin{pmatrix} p\_1 \ \cdots \ \mathbf{0} \\ \vdots \ \ddots \ \vdots \\ \mathbf{0} \ \cdots \ p\_s \end{pmatrix} \tag{4.44}$$

$$=\mathbf{G}\sum\_{k=1}^{s}\left(\mathbf{e}\_{k}\mathbf{e}\_{k}^{\mathsf{T}}\right)p\_{k}\tag{4.45}$$

Differentiating and applying the vec operator gives

$$d\text{vec}\,\mathbf{U} = \sum\_{k=1}^{s} \left(\mathbf{e}\_{k} \otimes \mathbf{G}\mathbf{e}\_{k}\right) \left(p\_{k}\,\left(d\mu\_{k}\right)\right) \tag{4.46}$$

Substitute this into (4.37) and focus on a change in mortality at stage *a*; the result is

$$\frac{dE(\eta\_1)}{d\mu\_a} = -\left(\mathbf{e}\_1^\mathsf{T} \otimes \mathbf{1}^\mathsf{T}\right) \left(\mathbf{N}^\mathsf{T} \otimes \mathbf{N}\right) (\mathbf{e}\_a \otimes \mathbf{G} \mathbf{e}\_a) \cdot p\_a \tag{4.47}$$

which simplifies to

$$\frac{dE\,\left(\eta\_{1}\right)}{d\mu\_{a}} = -\left(\mathbf{e}\_{1}^{\mathsf{T}}\mathbf{N}^{\mathsf{T}}\mathbf{e}\_{a}\otimes\mathbf{1}^{\mathsf{T}}\mathbf{N}\mathbf{G}\mathbf{e}\_{a}\right)p\_{a} \tag{4.48}$$

$$\mathbf{u} = -E \left( \upsilon\_{a1} \right) E \left( \mathfrak{p}^{\mathsf{T}} \right) \mathbf{G}(:,a) p\_a \tag{4.49}$$

$$=-\underbrace{E\left(\upsilon\_{a1}\right)}\_{\text{occupancy}}\sum\_{h=1}^{s}\underbrace{p\_{a}g\_{ha}}\_{\text{transtitions}}\underbrace{E\left(\eta\_{h}\right)}\_{\text{expactancy}}\qquad\text{stage-classified}\quad(4.50)$$

Equation (4.50) is the stage-classified version of Keyfitz-Pollard: the sensitivity of life expectancy to a change in mortality in stage *j* is the product of the expected time spent in stage *j* and the remaining life expectancy, calculated as an average of the life expectancy of all stages *k*, weighted by the probability of transition from *j* to *k*. This can be simplified further by noting that, for either age or stage-classified populations, **G***(*:*, a)pa* = **U***(*:*, a)*, so that a completely general expression is

$$\frac{dE\,(\eta\_{\text{l}})}{d\mu\_{a}} = -E\,(\upsilon\_{a\text{l}})\,E\left(\mathfrak{p}^{\mathsf{T}}\right)\mathbf{U}(:,a) \qquad \text{age- or stage-classified} \tag{4.51}$$

#### *4.4.4 Sensitivity of the Variance of Longevity*

The sensitivity of the variance in longevity is obtained by differentiating (4.23)

$$dV\,\left(\mathfrak{y}\right) = d\mathfrak{y}\_2 - 2\left(\mathfrak{y}\_1 \diamond d\mathfrak{y}\_1\right) \tag{4.52}$$

and applying the vec operator (using results from Chap. 2 on the vec of the Hadamard product), to obtain

$$dV\,\left(\mathfrak{y}\right) = d\mathfrak{y}\_2 - \mathcal{D}\,\left(\mathfrak{y}\_1\right)d\mathfrak{y}\_1.\tag{4.53}$$

The derivative of *η*<sup>1</sup> is already given by (4.33):

$$d\eta\_1 = \left(\mathbf{N}^{\mathsf{T}} \otimes \boldsymbol{\eta}\_1^{\mathsf{T}}\right) d\mathbf{vec} \,\mathbf{U}.\tag{4.54}$$

The derivative of *η*<sup>2</sup> is obtained by differentiating (4.22):

$$d\eta\_2^\mathsf{T} = 2\left(d\eta\_1^\mathsf{T}\right)\mathbf{N} + d\eta\_1^\mathsf{T}\left(d\mathbf{N}\right) - d\eta\_1^\mathsf{T} \tag{4.55}$$

Applying the vec operator to both sides and substituting (4.29) for *d*vec **N** gives

$$d\eta\_2 = \left(2\mathbf{N}^\mathsf{T} - \mathbf{I}\right)d\eta\_1 + 2\left(\mathbf{N}^\mathsf{T}\otimes\eta\_1^\mathsf{T}\mathbf{N}\right)d\mathrm{vec}\,\mathbf{U} \tag{4.56}$$

Inserting (4.54) for *dη*<sup>1</sup> and (4.56) for *dη*<sup>2</sup> into (4.53) gives the sensitivity of the variance in remaining longevity, for any starting age or stage, to changes in **U**. The sensitivity of longevity to mortality is obtained by differentiating **U** with respect to *μ*.

**Derivatives of U** The derivative of **U** to the mortality vector *μ* are obtained as follows. For an age-classified model, define an age-advancement matrix

$$\mathbf{L} = \begin{pmatrix} 0 \ 0 \ 0 \\ 1 \ 0 \ 0 \\ 0 \ 1 \ \text{[1]} \end{pmatrix} \tag{4.57}$$

(show here for three age classes, with the optional open-ended last age class). This matrix will mask the entries of a matrix **1p**T, that contains **p** in each row, to obtain

$$\mathbf{U} = \mathbf{L} \circ \left(\mathbf{1}\mathbf{p}^{\mathsf{T}}\right) \tag{4.58}$$

Differentiating and applying the vec operator gives

$$d\mathbf{U} = \mathbf{L} \circ \left(\mathbf{1}\left(d\mathbf{p}^{\mathsf{T}}\right)\right) \tag{4.59}$$

$$d\mathbf{vec}\,\mathbf{U} = \mathcal{D}\left(\mathbf{vec}\,\mathbf{L}\right)(\mathbf{I}\otimes\mathbf{1})\,d\mathbf{p}.\tag{4.60}$$

Since **p** = exp*(*−*μ)*,

$$d\mathbf{p} = -\mathcal{D}\left(\mathbf{p}\right)d\boldsymbol{\mu},\tag{4.61}$$

and hence

$$d\operatorname{vec}\mathbf{U} = -\mathcal{D}\left(\operatorname{vec}\mathbf{L}\right)\left(\mathbf{I}\otimes\mathbf{1}\right)\mathcal{D}\left(\mathbf{p}\right)d\boldsymbol{\mu}\qquad\text{age-classified}\tag{4.62}$$

For a stage-classified model, write **U** = **G**, as in (4.44) as

$$\mathbf{U} = \mathbf{G} \left[ \mathbf{I} \circ \left( \mathbf{1} \mathbf{p}^{\mathsf{T}} \right) \right] \tag{4.63}$$

Differentiating and applying the vec operator, following the strategy of (4.60), gives

$$d\text{vec}\,\mathbf{U} = -\left(\mathbf{I}\otimes\mathbf{G}\right)\mathcal{D}\left(\text{vec}\,\mathbf{I}\right)\left(\mathbf{I}\otimes\mathbf{1}\right)\mathcal{D}\left(\mathbf{p}\right)d\mu\qquad\text{stage-classified}\qquad(4.64)$$

Substituting (4.62) and (4.64) into the expressions for *dη*<sup>1</sup> and *dη*2, and substituting those into (4.53) gives the sensitivity of the variance in longevity to age- or stage-specific mortality. It is possible to carry out the substitutions and to arrive at a single (large) expression for *dV (η)*; see Chap. 5.

Figure 4.2d shows the sensitivity and elasticity of variance of longevity to changes in age-specific mortality. The variance is more sensitive to mortality changes in Japan than in India, and the sensitivities are highest at young ages. Both life tables have the property that sensitivities are positive at early ages (≈0–20 for India, ≈0–80 for Japan) and then become negative. Before this age, reductions in mortality will reduce variance; after this age, reductions in mortality increase the variance. See Sect. 4.4.6 for more on this.

#### *4.4.5 Sensitivity of the Distribution of Age at Death*

The sensitivity of the distribution of age or stage at death is obtained by differentiating (4.25) and applying the vec operator,

$$d\text{vec}\,\mathbf{B} = \left(\mathbf{N}^{\mathsf{T}} \otimes \mathbf{I}\right)d\text{vec}\,\mathbf{M} + \left(\mathbf{I} \otimes \mathbf{M}\right)d\text{vec}\,\mathbf{N}.\tag{4.65}$$

We already know *d*vec **N**. To obtain *d*vec **M**, note that when the absorbing states are defined in terms of stage at death

$$\mathbf{M} = \mathbf{I} - \mathcal{D}\left(\mathbf{p}\right) \tag{4.66}$$

and thus

$$d\text{vec}\,\mathbf{M} = -\mathcal{D}\left(\text{vec}\,\mathbf{I}\right)\left(\mathbf{I}\otimes\mathbf{1}\right)d\mathbf{p}\tag{4.67}$$

It is revealing to write the sensitivity of **B** to changes in mortality using the chain rule,

$$\frac{d\mathbf{vec}\,\mathbf{B}}{d\mu^{\mathsf{T}}} = \left(\mathbf{N}^{\mathsf{T}} \otimes \mathbf{I}\right) \frac{d\mathbf{vec}\,\mathbf{M}}{d\mathbf{p}^{\mathsf{T}}} \frac{d\mathbf{p}}{d\mu^{\mathsf{T}}} + \left(\mathbf{I} \otimes \mathbf{M}\right) \frac{d\mathbf{vec}\,\mathbf{N}}{d\mathbf{vec}^{\mathsf{T}}\mathbf{U}} \frac{d\mathbf{vec}\,\mathbf{U}}{d\mathbf{p}^{\mathsf{T}}} \frac{d\mathbf{p}}{d\mu^{\mathsf{T}}} \qquad (4.68)$$

and to recognize how many of the pieces we have already obtained.

The distribution of stage at death for individuals starting in stage *j* is given by column *j* of **B**; i.e., *ψ<sup>j</sup>* = **B***(*:*,j)*. The sensitivity of *ψj* to changes in mortality is

$$\frac{d\boldsymbol{\Psi}\_j}{d\boldsymbol{\mu}^\mathsf{T}} = \left(\mathbf{e}\_j \otimes \mathbf{I}\right) \frac{d\mathbf{vec} \,\mathbf{B}}{d\boldsymbol{\mu}^\mathsf{T}}\tag{4.69}$$

for any age or stage *j* of interest.

#### *4.4.6 Sensitivity of Life Disparity*

To get the sensitivity of the vector *η*†, differentiate and apply the vec operator to Eq. (4.27), which gives

$$d\eta^\dagger = \mathbf{B}^\mathsf{T} d\eta\_1 + \left(\mathbf{I} \otimes \eta\_1^\mathsf{T}\right) d\text{vec} \,\mathbf{B}.\tag{4.70}$$

Evaluating this expression for the data on India and Japan, we see that the sensitivity of *η*† shows a pattern similar to that of the sensitivity of *V (η)* (Fig. 4.2), confirming that these indices are measuring similar aspects of disparity in longevity.

In particular, they show the existence of a critical age, before which reductions in mortality reduce disparity and after which they have the opposite effect. Zhang and Vaupel (2009) showed that this critical age, which they describe as separating "early" from "late" deaths is a general property of *η*†. Although the details depend on which index of disparity one uses, the existence of a critical age separating positive and negative sensitivities is also a property of other measures of variation in longevity (Van Raalte and Caswell 2013). Vaupel et al. (2011) have used the critical age to decompose historical changes in lifespan disparity into components due to early and late mortality.

#### **4.5 A Time-Series LTRE Decomposition: Life Disparity**

The LTRE decomposition analysis in Sect. 2.9 can be used to decompose time series such as these into their components. We apply it here to calculate the contributions, to a long trajectory of changes in *η*†, of changes in early and late mortality.

Suppose that some demographic outcome *ξ (t)* (dimension *s* × 1) is measured as a function of a parameter vector *θ* (dimension *p* × 1), at times 1*,* 2*,...T* . The changes in *ξ (t)* over time result from the changes in the parameters,

$$
\Delta \xi(t) = \xi(t+1) - \xi(t) \tag{4.71}
$$

$$
\Delta\theta(t) = \theta(t+1) - \theta(t) \tag{4.72}
$$

The decomposition analysis for such sequences was introduced as a "regression LTRE" method in the context of ecotoxicology and response to environmental factors (e.g., Caswell 1996; Knight et al. 2009). The same approach was introduced independently by Horiuchi et al. (2008) to decompose differences between two conditions by imagining a continuous path from one to the other.

The analysis starts by considering the change in *ξ* over time,

$$\frac{d\xi(t)}{dt} = \frac{d\xi(t)}{d\theta^{\mathsf{T}}(t)} \frac{d\theta(t)}{dt} \tag{4.73}$$

If the time series is evaluated at discrete times *t* = 1*,...,T* , then to first order

$$
\Delta\xi(t) \approx \frac{d\xi(t)}{d\theta^{\mathsf{T}}(t)} \Delta\theta(t) \qquad s \times 1 \tag{4.74}
$$

The contributions to *ξ (t)* are displayed separately in a contribution matrix

$$\mathbf{C}(t) = \frac{d\boldsymbol{\xi}(t)}{d\boldsymbol{\theta}^{\mathsf{T}}(t)} \mathcal{D}\left[\boldsymbol{\Delta}\boldsymbol{\theta}(t)\right] \qquad \boldsymbol{s} \times \boldsymbol{p} \tag{4.75}$$

the *(i, j )* entry of **C***(t)* is the contribution of *θj (t)* to *ξi(t)*. The contributions additive over time, so the contributions of all the changes, integrated from *t*<sup>1</sup> to *t*2, are given by the entries of

$$\mathbf{C}\left(t\_1, t\_2\right) = \sum\_{t=t\_1}^{t\_2} \mathbf{C}(t) \tag{4.76}$$

Suppose the dependent variable is *<sup>ξ</sup>* <sup>=</sup> *<sup>η</sup>*† and the parameter vector is *<sup>θ</sup>* <sup>=</sup> *<sup>μ</sup>*.

At each time and for each age, we aggregate the contributions from early and late mortality. Let **X** be an indicator matrix whose entries define whether a particular entry of **C***(t)* is to be counted as early or late:

62 4 Sensitivity Analysis of Longevity and Life Disparity

$$\chi\_{lj} = \begin{cases} 1 & \theta\_j \text{ contributes to } \Delta \xi\_l \\ 0 & \text{otherwise} \end{cases} \tag{4.77}$$

Then

$$\mathbf{c}(t) = (\mathbf{C}(t) \diamond \mathbf{X})\mathbf{1} \tag{4.78}$$

is a vector giving the contributions to the change in *ξ* from the parameters chosen in **X**. Defining **X**early and **X**late gives changes at time *t* due to early and late mortality. The LTRE analysis is then

$$\mathbf{c}\_{\text{early}}(t\_1, t\_2) = \sum\_{t\_1}^{t\_2} \mathbf{c}\_{\text{early}}(t) \tag{4.79}$$

and similarly for **c**late*(t*1*, t*2*)*.

As an example, Fig. 4.3a, b shows a time series of life expectancy (increasing from about 40–80 years between 1800 and 2010) and life disparity for Swedish

**Fig. 4.3** (**a**) Historical trends in life expectancy at birth from 1800 to 2010. (**b**) Historical trends in life disparity (mean years of life lost due to mortality) for ages 0 and 50 years. (**c**) Contributions from early and late mortality improvement to the change in disparity at age 0. (**d**) The contributions for disparity at age 50. (Data for Swedish females, from the Human Mortality Database)

females, based on data from Human Mortality Database (2016). As in most developed countries, life disparity at birth dropped dramatically from 1850 to about 1950 (e.g., Edwards 2011; Vaupel et al. 2011). Declines at later ages were less dramatic, and remaining life disparity conditional on survival to age 50 has been almost flat (Engelman et al. 2014). How did changes in early and late mortality contribute to these patterns?

Figure 4.3c, d show the cumulative sums of the contributions **c**early and **c**late, and their total, for ages 0 and 50. The decline in life disparity at birth was driven almost completely by improvements in early mortality, which completely overshadowed a small increase in disparity that was generated by improvements in late life mortality. The picture for remaining life disparity at age 50 is different: the contributions from changes in early and late life mortality almost completely cancel each other out. These patterns, looking at the details of a single time series, agree with the much more general exploration of multiple countries, using a different approach, by Vaupel et al. (2011).

The accuracy of the decomposition can be evaluated by comparing the time series calculated from the total contributions, as shown in Fig. 4.3c, d, with the observed series, as shown in Fig. 4.3b. The agreement is extremely close; the LTRE decomposition captures the end result of the historical changes from 1800 to 2010 with an error of less than 0.1%.

#### **4.6 Conclusion**

This chapter and Chap. 3 contain examples of different approaches to the sensitivity analysis, of population growth rate and longevity, respectively. The power and flexibility of matrix calculus methods is apparent: the models are not restricted to age- or stage-classification, the absorbing states may be a single category of death or some more diverse set, the demographic outcomes are not limited to expectations, and the independent variables, the parameters that are being perturbed, can be anything of interest. The only requirement is that a chain of functional dependence can be followed: the outcome *ξ* depends on **U**, which depends on **p**, which depends on *μ*, . . . and so on. Mortality might depend on health status, which might depend on income level, which might depend on education, . . . , and so on. The sensitivity of *ξ* to any of these parameters is a application of the chain rule.

#### **Bibliography**

Canudas Romo, V. 2003. Decomposition methods in demography. Population Studies, Rozenberg Publishers, Amsterdam, Netherlands.

Caswell, H. 1996. Analysis of life table response experiments. II. Alternative parameterizations for size- and stage-structured models. Ecological Modelling **88**:73–82.


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 5 Individual Stochasticity and Implicit Age Dependence**

#### **5.1 Introduction**

Demography is the study of the population consequences of the fates of individuals. As an individual organism develops through its life cycle it may increase in size, change its morphology, develop new physiological functions, exhibit new behaviors, or move to new locations. It may marry and divorce, become ill and recover, or change its employment status. It may change sex and/or change its reproductive status. These changes can be dramatic. This developmental process, and its attendant risks of death and opportunities for reproduction, determine the rates of birth and death that, in turn, determine population growth or decline.

Individuals are differentiated on the basis of age or, in general, life cycle stages. The movement of an individual through its life cycle is a random process, and although the eventual destination (death) is certain, the pathways taken to that destination are stochastic and will differ even between identical individuals; this is *individual stochasticity*. A stage-classified demographic model contains implicit age-specific information, which can be analyzed using Markov chain methods. The living stages in the life cycles are transient states in an absorbing Markov chain, in which death is an absorbing state.

This chapter presents Markov chain methods for computing the mean and variance of the lifetime number of visits to any transient state, the mean and variance of longevity, the net reproductive rate *R*0, and the cohort generation time. It presents the matrix calculus methods needed to calculate the sensitivity and elasticity of all these indices to any life history parameters.

Chapter 5 is modified from Caswell, H. 2009. Stage, age, and individual stochasticity in demography. The Per Brinck Oikos Award Lecture 2008. Oikos 118:1763–1782. ©Hal Caswell

The Markov chain approach is then generalized to variable environments (deterministic environmental sequences, periodic environments, iid random environments, Markovian environments). Variable environments are analyzed using the vec-permutation method to create a model that classifies individuals jointly by the stage and environmental condition. Throughout, examples are presented using the North Atlantic right whale (*Eubaleana glacialis*) and an endangered prairie plant (*Lomatium bradshawii*) in a stochastic fire environment.

#### *5.1.1 Age and Stage, Implicit and Explicit*

The essence of demography is the connection between the fates of individual organisms and the dynamics of populations. There exist diverse mathematical frameworks in which this connection can be studied (Keyfitz 1967; Metz and Diekmann 1986; Nisbet and Gurney 1982; Caswell 1989; Tuljapurkar and Caswell 1997; Caswell et al. 1997; DeAngelis and Gross 1992; Ellner et al. 2016). Regardless of the type of equations used, demographic analysis must account for differences among individuals, and the ways in which those differences affect the vital rates.

Among the many ways that individuals may differ, age has long had a kind of conceptual priority. Age is universal in the sense that every organism becomes one minute older with the passage of one minute of time. Age is also often associated with predictable changes in the vital rates. However, in some organisms characteristics other than age provide more and better information about an individual. Ecologists recognized this long ago, and have developed demographic theory based on size, maturity, physiological condition, instar, spatial location, etc.—referred to in general as "stage-classified" demography. Human demographers, who were responsible for the classical age-classified theory, by no means deny the importance of other properties, such as employment, parity, or health status; see Land and Rogers (1982), Goldman (1994), Robine et al. (2003), and Willekens (2014) for a sample of the kinds of issues that arise.

Even when the demographic model is entirely stage-classified, however, age is still implicitly present. Individuals in a given stage may differ in age, and individuals of a given age may be found in many different stages, but each individual still becomes one unit of age older with the passage of each unit of time. Extracting this implicit age-dependent information makes it possible to calculate interesting age-specific properties, such as survivorship, longevity, life expectancy, generation time, and net reproductive rate (Cochran and Ellner 1992; Caswell 2001, 2006; Tuljapurkar and Horvitz 2006; Horvitz and Tuljapurkar 2008).1

<sup>1</sup>*Explicit* age and stage dependence is explored in Chap. 6; see also Caswell and Salguero-Gómez (2013) and Caswell et al. (2018).

In this chapter, I show how to calculate some of these implicit age-specific properties from any stage-classified model. The trick is to formulate the life cycle as a Markov chain, and to generalize the "life" cycle to include death as a stage. Because death is permanent, it is called an *absorbing state*, and the theory of absorbing Markov chains provides the starting point for our analysis (Feichtinger 1971; Caswell 2001).

A Markov chain is a stochastic model for the movement of a particle among a set of states (e.g., Kemeny and Snell 1976; Iosifescu 1980). The probability distribution of the next state of the particle may depend on the current state, but not on earlier states. In our context, a "particle" is an individual organism. The states correspond to the stages of the life cycle, plus death (or perhaps multiple types of death, for example deaths due to different causes). This structure is ideally suited to asking questions about individual stochasticity, because it accounts for all the possible pathways, and their probabilities, that an individual can follow through its life. I will focus on discrete-time models, but much of the theory can no doubt be generalized to continuous-time models.

The use of Markov chains in demographic analysis is not new. As far as I know, Feichtinger (1971, 1973) was the first to use discrete-time absorbing Markov chains in demography, paying particular attention to competing risks and multiple causes of death. At around the same time, Hoem (1969) applied continuous-time Markov chains in the analysis of insurance systems (with states such as "active," "disabled," and "dead"). Later, Cochran and Ellner (1992) independently proposed the use of Markov chains to generate age-classified statistics from stage-classified models, but minimized the use of matrix notation in their presentation. Influenced by Feichtinger's work, and relying heavily on Iosifescu's (1980) treatment of absorbing Markov chains, I extended the calculations using matrix notation (Caswell 2001; Keyfitz and Caswell 2005), introduced sensitivity analysis (Caswell 2006), and presented results for both time-invariant and time-varying models. At the same time, Tuljapurkar and Horvitz (2006) and Horvitz and Tuljapurkar (2008) developed the same approaches and presented a more extensive investigation of time variation.

#### *5.1.2 Individual Stochasticity and Heterogeneity*

Consider a newborn individual. As it develops through the stages of its life cycle, it may grow, shrink, mature, move, reproduce, and allocate resources among its biological processes. At each moment, it is exposed to various mortality risks. At each moment, it has some chance of reproducing. Because these processes are stochastic, the lives of any two individuals may differ. These random outcomes this *individual stochasticity*—imply that the age-specific properties of an individual (say, longevity) are random variables—there is a distribution among individuals that should be characterized by its mean, moments, etc. (Caswell 2009).

It is critical to notice that the calculation of these moments explicitly assumes that every individual in a given stage experiences exactly the same rates and hazards. There is no heterogeneity among the individuals (or, at least, none that matters demographically), even though there is variation in their lifetime properties. Empirical studies of longevity or lifetime reproductive output find that the variation among individuals is usually large, but it is a mistake to jump to the conclusion that it is due to heterogeneity among individuals without first examining the variance that is inevitably created by individual stochasticity (e.g., Tuljapurkar et al. 2009; Steiner and Tuljapurkar 2012; Caswell 2011; Caswell and Kluge 2015; Caswell and Vindenes 2018; Hartemink et al. 2017; Hartemink and Caswell 2018; van Daalen and Caswell 2017).

#### *5.1.3 Examples*

The calculations will be demonstrated by means of two case studies. The first is a stage-classified model for the North Atlantic right whale (*Eubaleana glacialis*). Later, in Sect. 5.5.4, a stochastic matrix model for the threatened prairie plant *Lomatium bradshawii* will appear as part of a study of variable environments.

The North Atlantic right whale is a large, highly endangered baleen whale (Kraus and Rolland 2007). Once abundant in the north Atlantic, it was decimated by whaling, beginning as much as a thousand years ago (Reeves et al. 2007). By 1900 the eastern North Atlantic stock had been effectively eliminated, and the western North Atlantic stock hunted to near extinction. The population has recovered only slowly since receiving at least nominal protection in 1935, and now numbers only about 300 individuals. Right whales migrate along the Atlantic coast of North America, from summer feeding grounds in the Gulf of Maine and Bay of Fundy to winter calving grounds off the Southeastern U.S. They are killed by ship collisions and entanglement in fishing gear (Kraus et al. 2005), and may also be affected by pollution of coastal waters.

Individual right whales are photographically identifiable by scars and callosity patterns. Since 1980, the New England Aquarium has surveyed the population, accumulating a database of over 10,000 sightings (Crone and Kraus 1990). Treating the first year of identification of an individual as marking, and each year of resighting as a recapture, permits the use of mark-recapture statistics to estimate demographic parameters of this endangered population (Caswell et al. 1999; Fujiwara and Caswell 2001, 2002; Caswell and Fujiwara 2004).

Figure 5.1 shows a life cycle graph used by Caswell and Fujiwara (2004) as the basis of a stage-structured matrix population model for the right whale. The stages are calves, immature females, mature but non-reproductive females, mothers, and "resting" mothers (because of the long period of parental care and gestation, right whales do not reproduce in the year after giving birth). This life cycle is typical of large, long-lived monovular mammals and birds.

**Fig. 5.1** Absorbing Markov chain transition graph for females of the North Atlantic right whale (*Eubalaena glacialis*). Projection interval is 1 year. Stages: 1 = calf, 2 = immature, 3 = mature, 4 = mother, 5 = post-breeding female, 6 = death. See Caswell and Fujiwara (2004) for explanation and parameter estimates

The model is parameterized in terms of survival probabilities *σ*1*,...,σ*5, the probability of maturation *γ*2, and the birth probability *γ*3. The projection matrix is

$$\mathbf{A} = \begin{pmatrix} 0 & 0 & F & 0 & 0 \\ \sigma\_1 \, \sigma\_2 (1 - \chi\_2) & 0 & 0 & 0 \\ 0 & \sigma\_2 \chi\_2 & \sigma\_3 (1 - \chi\_3) & 0 & \sigma\_5 \\ 0 & 0 & \sigma\_3 \chi\_3 & 0 & 0 \\ 0 & 0 & 0 & \sigma\_4 & 0 \end{pmatrix} \tag{5.1}$$

The fertility term in the *(*1*,* 3*)* position is *F* = 0*.*5*σ*3*γ*<sup>3</sup> <sup>√</sup>*σ*4, accounting for the sex ratio, the survival of mature females, their probability of giving birth if they survive, and the effect of survival of the mother on survival of the calf. For reasons related to parameter estimation, *σ*<sup>5</sup> is constrained to equal *σ*3.

#### **5.2 Markov Chains**

The familiar life cycle graph (e.g., Fig. 5.1) corresponds to a projection matrix **A**, in which *aij* gives the per-capita production of stage *i* individuals at *t* + 1 by a stage *j* individual at *t*. This production may occur by the transition of an individual from stage *j* to stage *i*, or by the production of one or more new individuals (by reproduction, fragmentation, etc.). So, we partition **A** into a matrix **U** describing transition probabilities of extant individuals and a matrix **F** describing the production of new individuals

$$\mathbf{A} = \mathbf{U} + \mathbf{F} \tag{5.2}$$

The column sums of **U** are all less than or equal to 1. Because individuals eventually die and pass out of the stages contained in **U**, those stages are called transient states.

#### *5.2.1 An Absorbing Markov Chain*

If we include death explicitly (Fig. 5.1) and remove the arcs representing reproduction, we obtain the graph corresponding to the transition matrix for an absorbing Markov chain

$$\mathbf{P} = \left(\frac{\mathbf{U}|0}{\mathbf{m}|1}\right) \tag{5.3}$$

The element *mj* of the vector **m** is the probability of mortality of an individual in stage *j* . Death is an absorbing state. I will assume that at least one absorbing state is accessible from any transient state in **U**, and that the spectral radius of **U** is strictly less than 1. This guarantees that, with probability 1, every individual ends up in the absorbing state.

**The right whale** Fujiwara estimated **U** by applying multi-stage mark-recapture methods to the photographic identification catalog. Although the best model, out of a large number evaluated, included significant time variation in survival and birth rates, here I will analyze a single matrix obtained from a time-invariant model. The complete transient matrix **U** and the fertility matrix **F** are

$$\begin{aligned} \mathbf{U} &= \begin{pmatrix} 0 & 0 & 0 & 0 & 0 \\ 0.90 & 0.85 & 0 & 0 & 0 \\ 0 & 0.12 & 0.71 & 0 & 1.00 \\ 0 & 0 & 0.29 & 0 & 0 \\ 0 & 0 & 0 & 0.85 & 0 \end{pmatrix} \\\\ \mathbf{F} &= \begin{pmatrix} 0 \ 0 \ 0.13 \ 0 \ 0 \\ 0 \ 0 \ 0 \end{pmatrix} \\ \mathbf{F} &= \begin{pmatrix} 0 \ 0 \ 0.13 \ 0 \ 0 \\ 0 \ 0 \ 0 \end{pmatrix} \\ \mathbf{0} & \mathbf{0} & \mathbf{0} \ 0 \ 0 \\ \mathbf{0} & \mathbf{0} & \mathbf{0} \ 0 \\ \mathbf{0} & \mathbf{0} & \mathbf{0} \ 0 \end{pmatrix} \end{aligned} \tag{5.5}$$

#### *5.2.2 Occupancy Times and the Fundamental Matrix*

As the syllogism asserts, all men are mortal; absorbtion is certain. Our question is, how long does absorbtion take and what happens en route? From a demographic perspective, this is asking about the lifespan of an individual and the events that happen during that lifetime. The key to these questions is the *fundamental matrix* of the absorbing Markov chain. Consider an individual presently in transient state *j* . As time passes, it will visit other transient states, repeating some, skipping others, until it eventually dies. Let *νij* denote the number of visits to, or the occupancy time in, transient state *i* that our individual, starting in transient state *j* , makes before being absorbed. The *νij* are random variables, reflecting individual stochasticity.

The entries of the matrix **U** give the probabilities of visiting each of the transient states after one time step. The entries of **U**<sup>2</sup> give the probabilities of visiting each of the transient states after two time steps. Adding the powers of **U** gives the expected number of visits to each transient state, over a lifetime, in a matrix **N**; i.e.,

$$\mathbf{N} = \left( E(\boldsymbol{\nu}\_{ij}) \right)$$

$$= \sum\_{t=0}^{\infty} \mathbf{U}^{t}$$

$$= (\mathbf{I} - \mathbf{U})^{-1} . \tag{5.6}$$

**The right whale** The fundamental matrix for the right whale is calculated from (5.6) to be

$$\mathbf{N} = \begin{pmatrix} 1.00 & 0.00 & 0.00 & 0.00 & 0.00 \\ 5.88 & 6.52 & 0.00 & 0.00 & 0.00 \\ 16.35 & 18.11 & 22.94 & 19.49 & 22.94 \\ & \mathbf{4.74} & 5.25 & 6.65 & 6.65 & 6.65 \\ & \mathbf{4.02} & \mathbf{4.46} & \mathbf{5.65} & \mathbf{5.65} & 6.65 \end{pmatrix} . \tag{5.7}$$

The first column corresponds to calves. On average, a calf will spend 1 year as a calf, 5.9 years as a juvenile, 16.3 years as a mature but non-breeding female, etc. Row 4 of **N** is of particular interest. Stage 4 represents mothers, so *n*4*<sup>j</sup>* is the expected number of reproductive events that a female in stage *j* will experience during her remaining lifetime. Based on this model, a newborn calf could expect to give birth *n*<sup>41</sup> = 4*.*74 times. A mature female could expect to give birth *n*<sup>43</sup> = 6*.*65 times; the difference reflects the likelihood of mortality between birth and maturity.2

We would like to know how the entries of **N** vary in response to changes in the vital rates. To accomplish this, we need matrix calculus, which is the topic of the next section.

<sup>2</sup>Note that *<sup>n</sup>*<sup>43</sup> <sup>=</sup> *<sup>n</sup>*<sup>44</sup> <sup>=</sup> *<sup>n</sup>*<sup>45</sup> <sup>=</sup> *<sup>n</sup>*<sup>55</sup> and *<sup>n</sup>*<sup>53</sup> <sup>=</sup> *<sup>n</sup>*54. This seems to be due to the fact, specific to these data, that the survival probability of stages 3 and 5 is indistinguishable from 1.0, and influences the results below.

#### *5.2.3 Sensitivity of the Fundamental Matrix*

Let us apply matrix calculus to find the sensitivity of the fundamental matrix **N** (Caswell 2006). This result will appear in the sensitivity analysis of most other demographic quantities. Let *θ* be a vector of parameters (of dimension *p* × 1) on which the entries of the transition matrix **U** depend. The fundamental matrix satisfies

$$\mathbf{I} = \mathbf{N} \mathbf{N}^{-1}.\tag{5.8}$$

Differentiating both sides gives

$$\mathbf{0} = (d\mathbf{N})\mathbf{N}^{-1} + \mathbf{N}\left(d\mathbf{N}^{-1}\right). \tag{5.9}$$

Applying the vec operator and Roth's theorem to both sides gives

$$\operatorname{vec}\mathbf{0} = \left[ \left( \mathbf{N}^{-1} \right)^{\mathsf{T}} \otimes \mathbf{I}\_{\mathsf{s}} \right] d\operatorname{vec}\mathbf{N} + \left( \mathbf{I}\_{\mathsf{s}} \otimes \mathbf{N} \right) d\operatorname{vec}\mathbf{N}^{-1} \tag{5.10}$$

Solving for *d*vec **N** gives

$$d\text{vec}\,\mathbf{N} = \left[ \left( \mathbf{N}^{-1} \right)^{\mathsf{T}} \otimes \mathbf{I}\_{\mathsf{s}} \right]^{-1} \left( \mathbf{I}\_{\mathsf{s}} \otimes \mathbf{N} \right) d\text{vec}\,\mathbf{U} \tag{5.11}$$

To simplify this, it helps to know two facts about the Kronecker product:

$$(\mathbf{A}\otimes\mathbf{B})^{-1}=\mathbf{A}^{-1}\otimes\mathbf{B}^{-1}\tag{5.12}$$

$$(\mathbf{A}\otimes\mathbf{B})\left(\mathbf{C}\otimes\mathbf{D}\right)=\left(\mathbf{A}\mathbf{C}\otimes\mathbf{B}\mathbf{D}\right)\tag{5.13}$$

provided that the sizes of the matrices permit the indicated operations. Thus *d*vec **N** in (5.11) simplifies to

$$d\text{vec}\,\mathbf{N} = \left(\mathbf{N}^{\mathsf{T}} \otimes \mathbf{N}\right)d\text{vec}\,\mathbf{U} \tag{5.14}$$

The identification theorem (2.47) implies

$$\frac{d\mathbf{vec}\,\mathbf{N}}{d\mathbf{vec}\,\mathbf{\bar{\tau}}\,\mathbf{U}} = \mathbf{N}^{\mathsf{T}} \otimes \mathbf{N} \tag{5.15}$$

and the chain rule permits us to write

$$\frac{d\mathbf{vec}\,\mathbf{N}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left(\mathbf{N}^{\mathsf{T}} \otimes \mathbf{N}\right) \frac{d\mathbf{vec}\,\mathbf{U}}{d\boldsymbol{\theta}^{\mathsf{T}}}\tag{5.16}$$

**Fig. 5.2** Elasticity, to each of the vital rates, of reproductive outcomes in the right whale. (**a**) The elasticity of the expected lifetime number of reproductive events (*E(ν*41*)*). (**b**) The elasticity of the variance in the lifetime number of reproductive events, *V (ν*41*)*. Vital rates: *s*1–*s*<sup>4</sup> are survival probabilities (*s*<sup>5</sup> = *s*<sup>3</sup> by assumption in this model); *g*<sup>2</sup> is the probability of maturation, and *g*<sup>3</sup> is the probability of reproduction

The left-hand side of (5.16) is a matrix, of dimension *<sup>s</sup>*<sup>2</sup> <sup>×</sup> *<sup>p</sup>*, containing the sensitivity of every entry of **N** to every parameter in *θ*. The matrix *d*vec **U***/dθ*<sup>T</sup> is an *<sup>s</sup>*<sup>2</sup> <sup>×</sup> *<sup>p</sup>* matrix containing the sensitivities of all the elements of **<sup>U</sup>** to all the elements of *θ*. From (2.55), the elasticity of the fundamental matrix is given by

$$\frac{\epsilon \mathbf{vec} \,\mathbf{N}}{\epsilon \theta^{\mathsf{T}}} = \mathcal{D} \,(\mathbf{vec} \,\mathbf{N})^{-1} \, \frac{d \mathbf{vec} \,\mathbf{N}}{d \theta^{\mathsf{T}}} \,\mathcal{D} \,(\theta) \tag{5.17}$$

**The right whale** As an example, we use (5.16) and (5.17) to calculate the elasticity of the expected lifetime number of reproductive events, *E(ν*41*)* = *n*41, with respect to the survival probabilities *σ*1*,...,σ*4, the maturation probability *γ*2, and the breeding probability *γ*3. Figure 5.2 shows that the number of breeding events is most elastic to mature female survival (*σ*3), and less so to the survival of mature females or mothers (*σ*<sup>2</sup> and *σ*4). Changes in the probability of giving birth, *γ*3, have, remarkably enough, no impact on the expected number of reproductive events.

The elasticity of *n*<sup>41</sup> to *σ*<sup>3</sup> (survival of mature females) is approximately 30. This implies that a 1% increase in *σ*<sup>3</sup> would produce about a 30% increase in the expected number of reproductive events.

#### **5.3 From Stage to Age**

The fundamental matrix summarizes the age-specific information implicit in the transient matrix **U**, even if the model is stage-classified and age does not appear explicitly. We now extend this, to explore a series of age-specific demographic indices and their sensitivity analyses. Some are well known (*R*0, generation time), others little explored (variance in longevity, for example). They can, however, all be easily calculated from any stage-classified model.

#### *5.3.1 Variance in Occupancy Time*

The occupancy time in any transient state is a random variable; the fundamental matrix **N** gives its mean. Some individuals will visit that state more often, some less often, some not at all. This basic property of individual stochasticity can be described by the variance of *νij* . Iosifescu (1980), Theorem 3.1 gives a formula for all the moments of the *νij* ; from this we can calculate the matrix of variances

$$\mathbf{V} = \left( V(\boldsymbol{\upsilon}\_{lj}) \right) = \left( 2\mathbf{N}\_{\rm dg} - \mathbf{I} \right) \mathbf{N} - \mathbf{N} \circ \mathbf{N} \tag{5.18}$$

(Caswell 2006) where ◦ denotes the Hadamard, or element-by-element, product and **N**dg is a matrix with the diagonal elements of **N** on its diagonal and zeros elsewhere. The standard deviations of the occuancy times are the square roots of the elements of **V**.

**The right whale** For the right whale, the matrix of variances calculated from (5.18) is

$$\mathbf{V} = \begin{pmatrix} 0.00 & 0.00 & 0.00 & 0.00 & 0.00 \\ 36.18 & 35.95 & 0.00 & 0.00 & 0.00 \\ 466.44 & 484.80 & 503.32 & 494.86 & 503.32 \\ 35.80 & 36.98 & 37.54 & 37.54 & 37.54 \\ 33.28 & 34.94 & 37.54 & 37.54 & 37.54 \end{pmatrix}, \tag{5.19}$$

and the corresponding standard deviations are

$$
\begin{pmatrix}
\begin{pmatrix}
\end{pmatrix} \begin{pmatrix}
\begin{matrix}
0.00 & 0.00 & 0.00 & 0.00 & 0.00 \\
6.02 & 6.00 & 0.00 & 0.00 & 0.00 \\
21.60 & 22.02 & 22.43 & 22.25 & 22.43 \\
5.98 & 6.08 & 6.13 & 6.13 & 6.13 \\
5.77 & 5.91 & 6.13 & 6.13 & 6.13
\end{pmatrix}
\end{pmatrix}.
\end{cases}
\tag{5.20}
$$

The variance in the *νij* is the result of luck, not heterogeneity. That is, it is the variance among a group of individuals all experiencing exactly the same stage-specific transition and mortality probabilities in **U**. As such, it can provide a null model for studies of heterogeneity in quantities such as the number of reproductive events. This idea has been explored independently, and in more detail, by Tuljapurkar and colleagues (Tuljapurkar et al. 2009; Steiner and Tuljapurkar 2012).

The sensitivity of the variance is derived in Appendix A.1 as

$$\frac{d\text{vec}\,\mathbf{V}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left[2\left(\mathbf{N}^{\mathsf{T}} \otimes \mathbf{I}\_{s}\right)\mathcal{D}\left(\text{vec}\,\mathbf{I}\_{s}\right) + 2\left(\mathbf{I}\_{s} \otimes \mathbf{N}\_{\text{dg}}\right)\right]$$

$$-\mathbf{I}\_{s^{2}} - 2\mathcal{D}\left(\text{vec}\,\mathbf{N}\right)\right]\frac{d\text{vec}\,\mathbf{N}}{d\boldsymbol{\theta}^{\mathsf{T}}}\tag{5.21}$$

Elasticities of **V** are calculated using (2.55).

**Hint** Before looking at Appendix A.1, to derive (5.21), write **N**dg = **I** ◦ **N**, differentiate (5.18), and use the fact that vec *(***A** ◦ **B***)* = D *(*vec **A***)*vec **B** = D *(*vec **B***)*vec **A**.

**The right whale** The elasticities of *V (ν*41*)*, calculated from (5.21) and (5.17), are shown in Fig. 5.2b. They are roughly proportional to the elasticities of *E (ν*41*)*; that is, the vital rates that have large effects on the expected number of reproductive events also have large effects on the variance.

#### *5.3.2 Longevity and Life Expectancy*

Longevity is an important demographic characteristic (Carey 2003). Mean longevity, or life expectancy, it is one of the most widely reported demographic statistics, used to compare populations, species, countries, regions, historical periods, etc., and to examine the effects of evolutionary, management, medical, and social processes. The longevity of an individual is the sum of the time spent in all of the transient states before final absorption. Let the random variable *ηj* denote the longevity of an individual currently in stage *j* . Then

$$
\eta\_j = \sum\_l \nu\_{lj}. \tag{5.22}
$$

A vector *E(η)* of expected longevities, or life expectancies, is obtained by summing the columns of **N**:

$$E(\boldsymbol{\eta}^{\mathsf{T}}) = \mathbf{1}^{\mathsf{T}} \mathbf{N} \tag{5.23}$$

where **1** is a vector of ones. Often, life expectancy at birth is of primary interest. If stages are numbered so that birth corresponds to stage 1, then life expectancy at birth is

$$E(\eta\_{\rm l}) = \mathbf{1}^{\mathsf{T}} \mathbf{N} \mathbf{e}\_{\rm l} \tag{5.24}$$

where **e**<sup>1</sup> is a vector with 1 in the first entry and zeros elsewhere.

The sensitivity of life expectancy in age-classified models has been studied by Pollard (1982) and Keyfitz (1971); see Keyfitz and Caswell (2005, Section 4.3), Vaupel (1986), and Vaupel and Canudas Romo (2003).

For more general stage-classified models, the sensitivity of *E(η)* is (Caswell 2006)

$$\frac{dE(\eta)}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left(\mathbf{I}\_{\boldsymbol{s}} \otimes \mathbf{1}^{\mathsf{T}}\right) \left(\mathbf{N}^{\mathsf{T}} \otimes \mathbf{N}\right) \frac{d \text{vec}\,\mathbf{U}}{d\boldsymbol{\theta}^{\mathsf{T}}} \tag{5.25}$$

**Hint** To obtain (5.25), differentiate both sides of (5.23), apply the vec operator, and use (5.16) for the derivative of **N**. See Appendix A.2 for the derivation.

**The right whale** For the right whale, the vector of life expectancies is

$$E(\boldsymbol{\eta}^{\mathsf{T}}) = \left( \text{32.0 34.3 35.2 31.8 36.2} \right) \tag{5.26}$$

Because mortality rates vary relatively little among stages, the life expectancies of the stages differ by only about 15%. Thus life expectancy for a calf implied by these data was 32 years. The elasticities of life expectancy to the vital rates are shown in Fig. 5.3. Life expectancy is most elastic to mature female survival *σ*3, and less so to *σ*<sup>2</sup> and *σ*3. This partly reflects the longer amount of time spent as a mature female, compared to an immature female or mother; see (5.7). The elasticity to the birth rate *γ*<sup>3</sup> is negative, because of the reduced survival of mothers. A 1% increase in *γ*<sup>3</sup> will lead to a 0.51% *decrease* in life expectancy. This is one possible measure of the cost of reproduction.

**Fig. 5.3** Elasticities of longevity for the right whale. (**a**) The elasticity, to each of the vital rates, of life expectancy for a female right whale calf. (**b**) The elasticity of the variance in longevity for a female right whale calf. Parameters as in Fig. 5.2

#### *5.3.3 Variance in Longevity*

Like the occupancy time in a transient state, longevity is a random variable, the variability of which is a measure of individual stochasticity. Individuals differ in longevity depending on the pathways taken from birth to death. This variance has been explored by human demographers, using life table methods, as one way of studying the inequality in life span generated by a given mortality schedule, and how that inequality has changed over time (e.g., Wilmoth and Horiuchi 1999; Shkolnikov et al. 2003; Edwards and Tuljapurkar 2005; Van Raalte and Caswell 2013).

The variance of the time to absorbtion is

$$V(\boldsymbol{\eta}^{\mathsf{T}}) = \mathbf{1}^{\mathsf{T}} \mathbf{N} \left(2\mathbf{N} - \mathbf{I}\right) - E\left(\boldsymbol{\eta}^{\mathsf{T}}\right) \diamond E\left(\boldsymbol{\eta}^{\mathsf{T}}\right). \tag{5.27}$$

(Caswell 2006; Iosifescu 1980).

The sensitivity of the variance in longevity is

$$\begin{aligned} \frac{dV(\eta)}{d\boldsymbol{\theta}^{\mathsf{T}}} &= \left[ 2 \left( \mathbf{N}^{\mathsf{T}} \otimes \mathbf{1}^{\mathsf{T}} \right) + 2 \left( \mathbf{I}\_{s} \otimes \mathbf{1}^{\mathsf{T}} \mathbf{N} \right) \right. \\\\ & \qquad - \left( \mathbf{I}\_{s} \otimes \mathbf{1}^{\mathsf{T}} \right) - 2 \mathcal{D} \left( E \left( \eta \right) \right) \left( \mathbf{I}\_{s} \otimes \mathbf{1}^{\mathsf{T}} \right) \right] \left( \mathbf{N}^{\mathsf{T}} \otimes \mathbf{N} \right) \frac{d\mathbf{U}}{d\boldsymbol{\theta}^{\mathsf{T}}} \quad (5.28) \end{aligned}$$

The first entry of (5.28) is the sensitivity of the variance in longevity starting in stage 1.

**Hint** To derive (5.28), differentiate (5.27) and apply the vec operator and Roth's theorem to each term, using (5.25) for the derivative of *E(η)*. See Sect. A.3 for details.

**The right whale** For the right whale, the variance and standard deviation of longevity are given by

$$\left(V(\eta)^{\mathsf{T}}\right)^{\mathsf{T}} = \left(1157\ 1167\ 1172\ 1163\ 1172\right) \tag{5.29}$$

$$\left(SD(\mathfrak{q})^{\mathsf{T}}\right) = \left(\begin{array}{c}\text{34.0 34.2 34.2 34.1 34.2}\end{array}\right) \tag{5.30}$$

The life expectancy at birth of 32 years has a standard deviation of about 34 years. Note that this result implies a very long positive tail of longevity. The interpretation of this result is tricky; I will return to it in Sect. 5.7.

The elasticities of the variance of longevity of a calf are shown in Fig. 5.3b. The variance in longevity is increased by increases in *σ*3, less so by increases in *σ*<sup>2</sup> and *σ*4. The pattern of the elasticities is strikingly similar to that of the elasticities of *E(η)*.

#### *5.3.4 Cohort Generation Time*

Generation time measures the typical age at which offspring are produced, or the age at which the typical offspring is produced. It appears in the IUCN criteria for classifying threatened species (IUCN Species Survival Commission 2001) as well as in various evolutionary considerations. There are several definitions of generation time (Coale 1972); here we will examine the cohort generation time, defined as the mean age of production of offspring in a cohort of newborn individuals. From the definition it is clear why calculation of generation time is a problem in stage-classified models, in which the age of parents does not appear. Moreover, in stage-classified models, individuals may be born into several stages (e.g., cleisthogamous vs. chasmogamous seeds; LeCorff and Horvitz 2005), each with a different subsequent pattern of development, survival, and fertility. There could be a different generation time for each type of offspring, and if individuals may produce more than one type of offspring, the average age at which they are produced could differ from one kind of offspring to another.

Thus, we expect to have a generation time that measures the mean age of production of offspring of type *i* by an individual born in stage *j* . Write this as a vector *μ(j )*. Then it can be shown (Sect. A.5) that

$$
\mu^{(j)} = \mathcal{D} \left(\mathbf{FNe}\_j\right)^{-1} \mathbf{FNUNe}\_j \tag{5.31}
$$

The sensitivity of *μ(j )* is obtained by a methodical application of matrix calculus to (5.31). To simplify notation, define

$$\mathbf{X} = \mathcal{D} \begin{pmatrix} \mathbf{F} \mathbf{N} \mathbf{e}\_f \end{pmatrix} \tag{5.32}$$

$$\mathbf{r} = \mathbf{F} \mathbf{N} \mathbf{U} \mathbf{N} \mathbf{e}\_f \tag{5.33}$$

The resulting sensitivity of *μ(j )* is

$$\begin{aligned} \frac{d\boldsymbol{d}^{(j)}}{d\boldsymbol{\theta}^{\mathsf{T}}} &= -\left(\mathbf{r}^{\mathsf{T}} \otimes \mathbf{I}\right) \left(\mathbf{X}^{-1} \otimes \mathbf{X}^{-1}\right) \mathcal{D}\left(\text{vec}\,\mathbf{I}\right) \\\\ &\times \left[\left(\mathbf{1}\mathbf{e}\_{j}^{\mathsf{T}}\mathbf{N}^{\mathsf{T}} \otimes \mathbf{I}\right) \frac{d\mathbf{v}\mathbf{c}\,\mathbf{F}}{d\boldsymbol{\theta}^{\mathsf{T}}} + \left(\mathbf{1}\mathbf{e}\_{j} \otimes \mathbf{F}\right) \frac{d\mathbf{v}\mathbf{c}\,\mathbf{N}}{d\boldsymbol{\theta}^{\mathsf{T}}}\right] \\\\ &+ \left\{\left[\left(\mathbf{N}\mathbf{U}\mathbf{N}\mathbf{e}\_{j}\right)^{\mathsf{T}} \otimes \mathbf{I}\right] \frac{d\mathbf{v}\mathbf{c}\,\mathbf{F}}{d\boldsymbol{\theta}^{\mathsf{T}}} + \left[\left(\mathbf{U}\mathbf{N}\mathbf{e}\_{j}\right)^{\mathsf{T}} \otimes \mathbf{F}\right] \frac{d\mathbf{v}\mathbf{c}\,\mathbf{N}}{d\boldsymbol{\theta}^{\mathsf{T}}} \\\\ &+ \left[\left(\mathbf{N}\mathbf{e}\_{j}\right)^{\mathsf{T}} \otimes \mathbf{F}\mathbf{N}\right] \frac{d\mathbf{v}\mathbf{c}\,\mathbf{U}}{d\boldsymbol{\theta}^{\mathsf{T}}} + \left[\mathbf{e}\_{j}^{\mathsf{T}} \otimes \mathbf{F}\mathbf{N}\mathbf{U}\right] \frac{d\mathbf{v}\mathbf{c}\,\mathbf{N}}{d\boldsymbol{\theta}^{\mathsf{T}}}\right] \quad(5.34) \end{aligned}$$

**Hint** To derive (5.34), it helps to note that, for any vector **z**, one can write D *(***z***)* = **<sup>I</sup>**◦**z1**T. Apply this to **<sup>X</sup>**, differentiate all the terms in *μ(j )*, and apply the vec operator. With any luck, you will come out to this answer. See Sect. A.5.1 for derivation.

**The right whale** The elasticities of the generation time *μ(*1*)* of a calf are shown in Fig. 5.4. Changes in early survival (*σ*<sup>1</sup> and *σ*2) have little effect. Adult survival *σ*<sup>3</sup> and, to a lesser extent, *σ*<sup>4</sup> increase the generation time by extending the reproductive lifespan. The maturation probability *γ*<sup>2</sup> and the birth probability *γ*<sup>3</sup> have negative effects on generation time, because they speed up reproduction.

#### **5.4 The Net Reproductive Rate**

In age-classified demography, the net reproductive rate *R*<sup>0</sup> measures lifetime reproductive output. It also appears in epidemiology, where it measures the potential of a disease to spread (e.g., Diekmann et al. 1990; van den Driessche and Watmough 2002). The classical net reproductive rate satisfies three conditions:


In classical demography (Lotka 1939; Rhodes 1940),

$$R\_0 = \int\_0^\infty \ell(\mathbf{x}) m(\mathbf{x}) d\mathbf{x} \tag{5.35}$$

where *(x)* is survivorship to age *x* and *m(x)* is the maternity function. It is not difficult to show that *R*<sup>0</sup> defined in this way satisfies conditions C1, C2, and C3.

In stage-classified models, however, the calculation of *R*<sup>0</sup> must account for the multiple pathways that an individual may follow through the life cycle, and the production of multiple kinds of offspring along each of these pathways. Rogers (1974; see also Lebreton 1996) considered *R*<sup>0</sup> in the context of an age-classified population distributed across a set of spatial regions. However, these calculations assume that age-specific survival and fertility schedules are available for each region. A more general solution was provided by Cushing and Zhou (1994) for stage-classified populations with no age-specific information. Their analysis produces an index that satisfies as many as possible of the conditions C1, C2, and C3. de Camino-Beck and Lewis (2007, 2008) have derived graph-theoretic ways to calculate *R*0.

Consider an initial cohort at *t* = 0 with structure **x**0, and call this the first generation. This cohort will produce offspring according to **Fx**0. The survivors of the cohort at *t* = 1 will produce offspring according to **FUx**0. The survivors at *<sup>t</sup>* <sup>=</sup> 2 will produce offspring **FU**2**x**0, and so on. The second generation is composed of all the offspring of the first generation, obtained by summing over the lifetime of the cohort

$$\mathbf{x}(\mathbf{l}) = \left(\mathbf{F} \sum\_{i=0}^{\infty} \mathbf{U}^{i}\right) \mathbf{x}\_{0}$$

$$= \left(\mathbf{F} \mathbf{N}\right) \mathbf{x}\_{0} \tag{5.36}$$

Iterating this process leads to a model for the growth from one generation to the next

$$\mathbf{x}(k+1) = \mathbf{F}\mathbf{N}\mathbf{x}(k)\tag{5.37}$$

Cushing and Zhou (1994) define *R*<sup>0</sup> as the per-generation growth rate, given by the dominant eigenvalue *ρ* of **FN**,

$$R\_0 = \rho \text{[FN]} \tag{5.38}$$

Thus the Cushing-Zhou measure of *R*<sup>0</sup> clearly satisfies condition C2. Cushing and Zhou (1994) also prove (their Theorem 3) that *R*<sup>0</sup> defined in this way is less than, equal to, or greater than 1 if and only if *λ* is less than, equal to, or greater than one, respectively, thus satisfying condition C3.

The relation between lifetime offspring production and *R*<sup>0</sup> (condition C1) is more complicated when the life cycle contains multiple types of offspring. If only a single type of offspring is produced (call it stage 1), then **F** will have nonzero entries only in its first row, and **FN** will be upper triangular, with its dominant eigenvalue appearing in the *(*1*,* 1*)* position. i.e., the sum of the fertilities of each stage weighted by the expected time spent in that stage. This is precisely the expected lifetime offspring production, so for the case of a single type of offspring, the Cushing-Zhou *R*<sup>0</sup> also satisfies C1.

However, if the life cycle contains multiple types of offspring (say stages 1*,...,h*), the upper left *h* × *h* corner of **FN** will contain the expected lifetime production of offspring of types 1*,...,h* by individuals starting life as types 1*,...,h*. Since such a life cycle contains more than one kind of expected lifetime production of offspring, *R*<sup>0</sup> cannot satisfy C1 in the sense of being *the* expected lifetime reproduction. Instead, *R*<sup>0</sup> is calculated from all these expectations (as the dominant eigenvalue of this *h* × *h* submatrix). It determines per-generation growth and population persistence as a function of the expected lifetime production of all types of offspring in a way that satisfies C2 and C3.

**The right whale** The right whale produces only a single type of offspring. The fundamental matrix **N** is given by (5.7), the fertility matrix is given by (5.5), and the generation growth matrix is

$$\mathbf{FN} = \begin{pmatrix} 2.18 \ 2.42 \ 3.06 \ 2.60 \ 3.06 \\ 0 & 0 \\ 0 & 0 \\ 0 & 0 \\ 0 & 0 \end{pmatrix} \begin{pmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{pmatrix} \tag{5.39}$$

The dominant eigenvalue of **FN** is its *(*1*,* 1*)* entry

$$R\_0 = \sum\_j f\_{1j} E(\nu\_{j1}) = 2.18\tag{5.40}$$

It is interesting to compare *R*<sup>0</sup> = 2*.*18 with *E(ν*14*)* = 4*.*74. Only female offspring are counted in *R*0, whereas *E(ν*14*)* counts reproductive events regardless of the sex of the offspring produced. Still, *R*<sup>0</sup> is less than half of *E(ν*14*)*, because of the less than perfect survival of calves from *t* to *t* + 1.

#### *5.4.1 Net Reproductive Rate in Periodic Environments*

Periodic time-varying models (Caswell 2001, Chapter 13) are an interesting special case of the multiple offspring type problem. In a periodic model, apparently identical offspring (e.g., seeds) produced at different phases of the cycle (e.g., seasons) are, in effect, of different types of. To the extent that they face different environments, they will differ in their expected offspring production, and *R*<sup>0</sup> will differ depending on the phase of the cycle in which it is calculated.

The net reproductive rate in a periodic environment was calculated by Hunter and Caswell (2005a) in a study of the sooty shearwater, a pelagic seabird nesting on offshore islands in New Zealand. In that study, the year was divided into two short phases, during which breeding and harvest of chicks occur, and a longer phase encompassing the rest of the year. Let **B***<sup>i</sup>* = **U***i*+**F***<sup>i</sup>* be the projection matrix in phase *i* of the cycle. Without loss of generality, consider an environment with a period of 2 (e.g., winter and summer). The population is projected over a year, starting in phase 1, by

$$\mathbf{A}\_{\mathrm{l}} = \mathbf{B}\_{2}\mathbf{B}\_{\mathrm{l}} \tag{5.41}$$

which is decomposed as

$$\mathbf{A}\_{\mathrm{l}} = (\mathbf{U}\_{2} + \mathbf{F}\_{2})(\mathbf{U}\_{\mathrm{l}} + \mathbf{F}\_{\mathrm{l}})$$

$$= \mathbf{U}\_{2}\mathbf{U}\_{\mathrm{l}} + \mathbf{U}\_{2}\mathbf{F}\_{\mathrm{l}} + \mathbf{F}\_{2}\mathbf{U}\_{\mathrm{l}} + \mathbf{F}\_{2}\mathbf{F}\_{\mathrm{l}}\tag{5.42}$$

The first term includes only transitions, whereas the last three terms all describe some aspect of reproduction. Thus the annual matrix is **<sup>A</sup>**<sup>1</sup> <sup>=</sup> '**<sup>U</sup>** <sup>+</sup>'**F**, where

$$
\widehat{\mathbf{U}}\_{\mathrm{l}} = \mathbf{U}\_{2}\mathbf{U}\_{\mathrm{l}} \tag{5.43}
$$

$$
\widehat{\mathbf{F}}\_{\mathrm{l}} = \mathbf{U}\_{2}\mathbf{F}\_{\mathrm{l}} + \mathbf{F}\_{2}\mathbf{U}\_{\mathrm{l}} + \mathbf{F}\_{2}\mathbf{F}\_{\mathrm{l}} \tag{5.44}
$$

and

$$\mathcal{R}\_0^{(1)} = \rho \left[ \widehat{\mathbf{F}}\_1 \left( \mathbf{I} - \widehat{\mathbf{U}}\_1 \right)^{-1} \right] \tag{5.45}$$

where the superscript 1 indicates that this is the net reproductive rate of a generation beginning in season 1. The corresponding matrices for a generation starting in season 2 are obtained from

$$\mathbf{A}\_2 = \mathbf{B}\_1 \mathbf{B}\_2 \tag{5.46}$$

and lead to a net reproductive rate *R(*2*)* <sup>0</sup> . It is easily verified that *<sup>R</sup>(*1*)* <sup>0</sup> <sup>=</sup> *<sup>R</sup>(*2*)* <sup>0</sup> in general. This contrasts with the population growth rate *λ*, which is independent of cyclic permutation of the seasons. However, since *λ* is the same for **A**<sup>1</sup> and **A**2, it must be the case that *R(*1*)* <sup>0</sup> and *<sup>R</sup>(*2*)* <sup>0</sup> are both greater than or less than 1 together.

An alternative formulation of *R*<sup>0</sup> in periodic environments was published at the same time as Caswell (2009), by Bacaër (2009). He wrote the model, using methods equivalent to those in Sect. 5.5 below, by jointly classifying individuals by stage and by their phase within a seasonal cycle. Let **A***<sup>i</sup>* = **U***<sup>i</sup>* + **F***<sup>i</sup>* be the projection matrix in season *i*. Then, for example with three seasons, the projection matrix would take the block-circulant form

$$
\tilde{\mathbf{A}} = \begin{pmatrix} 0 & 0 & \mathbf{A}\_3 \\ \mathbf{A}\_1 & 0 & 0 \\ 0 & \mathbf{A}\_2 & 0 \end{pmatrix} \tag{5.47}
$$

(with similar formulations for **U**˜ and **F**˜). After some manipulations, Bacaër shows that **R**<sup>0</sup> is the dominant eigenvalue of the matrix<sup>3</sup>

$$\mathcal{R}\_0 = \rho \left( \vec{\mathbf{F}} \left( \mathbf{I} - \vec{\mathbf{U}} \right)^{-1} \right) \tag{5.48}$$

but Bacaër does not do this.

<sup>3</sup>It might be easier to apply the Cushing-Zhou theorem directly to **A**˜ and write

#### 5.4 The Net Reproductive Rate 85

$$
\begin{pmatrix}
\mathbf{F}\_1 & 0 & 0 \\
0 & \mathbf{F}\_2 & 0 \\
0 & 0 & \mathbf{F}\_3
\end{pmatrix}
\begin{pmatrix}
0 & -\mathbf{U}\_2 & \mathbf{I} \\
\mathbf{I} & 0 & -\mathbf{U}\_3
\end{pmatrix}^{-1}.\tag{5.49}
$$

Bacaër (2009) proves that *R*<sup>0</sup> calculated in this way satisfies condition *C*3, providing an indicator for population growth (*R*<sup>0</sup> *>* 1) or decline (*R*<sup>0</sup> *<* 1). However, this definition of *R*<sup>0</sup> does not satisfy *C*<sup>1</sup> because it does not distinguish the different lifetime reproductive output of individuals born in different seasons.

Cushing and Ackleh (2012) returned to this issue. They argue that the standard approach for studying dynamics of periodic models is to study the "periodic composite map", which is the map for the entire cycle composed of the product of the phase-specific matrices, as in (5.41), which projects over the entire cycle, rather than from one season to the next. They separate transitions and reproduction as in Eqs. (5.43) and (5.44), and prove that *R*<sup>0</sup> calculated in this way satisfies *C*<sup>1</sup> (with a different lifetime reproductive output for each starting season) and *C*<sup>3</sup> (so that the values of *R*<sup>0</sup> in each season agree in their determination of positive or negative growth). Cushing and Ackleh (2012) also explore the net reproductive rate in nonlinear models, in which *R*<sup>0</sup> calculated at zero density determines whether the extinction equilibrium is stable.

In the end, it is valuable to have two different ways of calculating *R*0, but it highlights the need to carefully specify which properties one wants the index to have.

#### *5.4.2 Sensitivity of the Net Reproductive Rate*

Since *R*<sup>0</sup> is obtained as an eigenvalue, its sensitivity to parameter changes is easy to derive. Let **x** and **y** be the right and left eigenvectors of **FN** corresponding to *R*0. Then (Caswell 2006) the sensitivity of *R*<sup>0</sup> is

$$\frac{d\boldsymbol{R}\_0}{d\boldsymbol{\theta}^\mathsf{T}} = \left(\mathbf{y}^\mathsf{T}\mathbf{N}^\mathsf{T}\otimes\mathbf{x}^\mathsf{T}\right)\frac{d\mathbf{vec}\,\mathbf{F}}{d\boldsymbol{\theta}^\mathsf{T}} + \left(\mathbf{y}^\mathsf{T}\mathbf{N}^\mathsf{T}\otimes\mathbf{x}^\mathsf{T}\mathbf{FN}\right)\frac{d\mathbf{vec}\,\mathbf{U}}{d\boldsymbol{\theta}^\mathsf{T}}\tag{5.50}$$

The first term captures the effects of changing fertility, the second term captures effects of changes in survival and transitions. The derivation of (5.50) is given in Appendix A.4.

**Hint** To derive (5.50), write *R*<sup>0</sup> = *ρ*[**FN**] and write *dR*<sup>0</sup> in terms of the right and left eigenvectors of **FN** and the differential of **FN**. Then expand *d(***FN***)* = *(d***F***)***N** + **F***d(***N***)* and apply the vec operator and the chain rule.

**The right whale** The elasticity of *R*<sup>0</sup> is shown in Fig. 5.5; *R*<sup>0</sup> is most elastic to *σ*3, less so to *σ*<sup>2</sup> and *σ*4. Remarkably, the elasticity of *R*<sup>0</sup> to the birth probability *<sup>γ</sup>*<sup>3</sup> is zero (actually, <sup>∼</sup> <sup>10</sup>−9). This is a case where lifetime reproductive output is

affected strongly by survival, slightly by maturation, but not at all by the probability of breeding given survival. This seems to be a consequence of the lower survival probability of mothers; an increase in *γ*<sup>3</sup> increases the probability of reproduction, but reduces the lifetime over which that reproduction will be realized.

#### *5.4.3 Invasion Exponents, Selection Gradients, and R***<sup>0</sup>**

Selection on life history traits can be studied in terms of the invasion exponent, which measures the rate at which a mutation, introduced at low densities, will increase in the environment created by a resident phenotype (Metz et al. 1992; Ferriére and Gatto 1993); for a recent introduction see Otto and Day (2007). The selection gradient on a trait is the derivative of the invasion exponent with respect to the value of the trait. If the derivative is positive, selection favors an increase in the trait, and vice-versa. The invasion exponent in a density-independent model is given by log *λ*. In a density-dependent model, the invasion exponent is given by the growth rate at equilibrium, *λ*[**n**ˆ]. The net reproductive rate *R*<sup>0</sup> is not, strictly speaking, an invasion exponent, but because it measures expected lifetime reproduction, it is attractive as a measure of fitness (see, e.g., the discussion in Kozlowski 1999). Using *R*<sup>0</sup> as a measure of fitness will lead to erroneous conclusions unless the selection gradients, measured in terms of *λ* and of *R*0, give the same answers, i.e., unless *dR*0*/dθ* ∝ *d* log *λ/dθ*.

For an age-classified model, we write *R*<sup>0</sup> in terms of the net maternity function *φ(x, θ )* = *(x, θ )m(x, θ )* where both survival and reproduction depend on some parameter *θ*. Then

$$R\_0(\theta) = \int\_0^\infty \phi(\mathbf{x}, \theta) d\mathbf{x} \tag{5.51}$$

The growth rate *r* = log *λ* is the solution to

$$1 = \int\_0^\infty \phi(\mathbf{x}, \theta) e^{-r(\theta)\mathbf{x}} d\mathbf{x} \tag{5.52}$$

Differentiating (5.51) and (5.52) gives

$$\frac{d\,R\_0}{d\theta} = \int\_0^\infty \frac{d\phi(\mathbf{x}, \theta)}{d\theta} d\mathbf{x} \tag{5.53}$$

$$\frac{dr}{d\theta} = \frac{\int\_0^\infty e^{-r\chi \frac{d\phi(\mathbf{x},\theta)}{d\theta}} d\mathbf{x}}{\int\_0^\infty \mathbf{x}\phi(\mathbf{x},\theta)e^{-r\chi} d\mathbf{x}} \tag{5.54}$$

Equation (5.54) is Hamilton's (1966) famous result; the denominator is the generation time measured as the average age of reproduction in the stable age distribution (see Chap. 3).

When *R*<sup>0</sup> = 1 and *r* = 0, it follows from (5.53) and (5.54) that the gradients *dr/dθ* and *dR*0*/dθ* are proportional. Use of either will lead to the same conclusions about selection. But when *r* = 0, this is not the case. If *r >* 0, then *dr/dθ* is reduced for traits that operate at later ages, because *dφ/dx* is weighted by *e*−*rx* . It is an open problem to generalize this result to stage-classified models, and prove that

$$\frac{d\log\lambda}{d\theta^{\mathsf{T}}} \propto \frac{dR\_0}{d\theta^{\mathsf{T}}}\tag{5.55}$$

when *λ* = *R*<sup>0</sup> = 1. In a few cases I have examined, it appears to be true numerically. As the following example shows, it is certainly the case that when *λ* = 1, the derivatives are not generally proportional.

**The right whale** The lack of proportionality between the selection gradients in terms of *λ* and of *R*<sup>0</sup> means that evolutionary conclusions will differ depending on which is used, especially when tradeoffs exist between two or more traits. For example, for the right whale, *λ* = 1*.*025 and *R*<sup>0</sup> = 2*.*183. Figure 5.6 shows the sensitivity of *λ* and of *R*0; while the patterns are similar, they are *not* proportional, and the use of *R*<sup>0</sup> as an invasion exponent would result in erroneous predictions. Suppose a trait existed that would increase the birth probability *γ*<sup>3</sup> at the cost of a reduction in calf survival *σ*1, with the cost measured by *c* = −*dσ*1*/dγ*3. An increase in this trait would be favored by selection provided that

$$c < \frac{\partial \lambda / \partial \lambda\_3}{\partial \lambda / \partial \sigma\_1} = 0.96\tag{5.56}$$

**Fig. 5.6** (**a**) The sensitivity, to each of the vital rates, of the net reproductive rate *R*<sup>0</sup> for the right whale. (**b**) The sensitivity of population growth rate *λ*. The derivative of *λ* is the selection gradient; use of the derivative of *R*<sup>0</sup> leads to erroneous predictions unless the population is at equilibrium. Parameters as in Fig. 5.2

But if expected lifetime reproduction was used as an invasion exponent, the analysis would conclude that selection would favor an increase in the trait only if

$$c < \frac{\partial \mathcal{R}\_0 / \partial \mathcal{\rho}\_3}{\partial \mathcal{R}\_0 / \partial \sigma\_1} = 0.0 \tag{5.57}$$

That is, according to *R*0, any cost whatsoever of increased birth rate would prevent selection from favoring it. According to *λ* (and correctly, in this case), selection would favor increased birth rate provided that the cost was not too great. In spite of the superficial similarity of the patterns in Fig. 5.6, the evolutionary implications are quite different, reflecting the impact of *timing* of life history events on *λ*. The sensitivities of *λ* to *σ*<sup>2</sup> and *γ*2, which influence early survival and the age at maturity, are larger than the sensitivities of *R*<sup>0</sup> to the same parameters.

#### *5.4.4 Beyond R***0***: Individual Stochasticity in Lifetime Reproduction*

Variation among individuals is fundamental to population biology. As argued here, two sources of variation must be distinguished: heterogeneity and individual stochasticity Heterogeneity refers to genuine differences among individuals, because of which the individuals experience different vital rates. Individual stochasticity refers to the apparent differences that result from the random outcome of identical vital rates, applied to identical individuals. We have seen above that individual stochasticity is always present. That is particularly true of lifetime reproductive output (LRO). The net reproductive rate is the *expectation* of LRO, but what can we say about the variance among individuals.

Empirical measurement shows that LRO is usually highly variable among individuals and positively skewed. Typically, a few individuals produce many offspring while most produce few, or none at all (Clutton-Brock 1988; Newton 1989). If this variance reflected heterogeneity among individual properties, and if the heterogeneity had a genetic basis, the variance would provide material for natural selection (the "opportunity for selection" of Crow 1958). Population and quantitative genetics are replete with methods to measure such genetic variation; e.g., Lande and Arnold (1983) and Endler (1986).

However, variance among individuals in LRO is not evidence of heterogeneity, genetic or otherwise; some is due to individual stochasticity. Only after evaluating the extent of individual stochasticity can data on LRO be interpreted as evidence for heterogeneity (Caswell 2011; Tuljapurkar et al. 2009; Steiner et al. 2010; Steiner and Tuljapurkar 2012). Caswell (2011) developed a method to calculate the mean, variance, and higher moments of lifetime reproductive output for any age- or stage-classified life cycle, using Markov chains with rewards; see van Daalen and Caswell (2015, 2017) for full details. In these models4 the movement of the individual through its life cycle is described by an absorbing Markov chain; mortality appears as transitions to an absorbing (dead) state. At each step, the individual accumulates a "reward." In our context, the reward is the production of offspring. The reproductive reward is a random variable with a specified set of moments. The reward accumulated by the inevitable death of the individual is its LRO. Although every individual experiences the same vital rates—there is no heterogeneity—each individual may experience a different life and thus a different lifetime reproductive output.

Stage-specific reproductive output is specified by a set of reward matrices **R***k*. The *(i, j )* element of **R***<sup>k</sup>* is the *k*th moment of the reproductive output associated with the transition from stage *j* to stage *i*. Given the reward matrices, the Markov chain transition matrix **P**, and the reasonable assumption that the dead do not reproduce, all the moments of LRO can be calculated (van Daalen and Caswell 2017).

Let *ρ*˜ *<sup>k</sup>* be a vector containing the *k*th moments of LRO for individuals starting in each transient (living) stage. Then, it has been shown (van Daalen and Caswell 2017) that, e.g., the first two moments of LRO are

$$\tilde{\boldsymbol{\rho}}\_{\text{l}} = \mathbf{N}^{\mathsf{T}} \mathbf{Z} \left(\mathbf{P} \diamond \mathbf{R}\_{\text{l}}\right)^{\mathsf{T}} \mathbf{1}\_{\mathsf{s}+\mathsf{l}} \tag{5.58}$$

$$\tilde{\boldsymbol{\rho}}\_2 = \mathbf{N}^\mathsf{T} \left[ \mathbf{Z} (\mathbf{P} \diamond \mathbf{R}\_2)^\mathsf{T} \mathbf{1}\_{s+1} + 2 (\mathbf{U} \diamond \mathbf{R}\_1)^\mathsf{T} \tilde{\boldsymbol{\rho}}\_1 \right] \tag{5.59}$$

<sup>4</sup>Markov chains with rewards have a long history in stochastic process theory; see Howard (1960), Puterman (1994), and Sheskin (2010).

where *s* is the number of stages in the life cycle, ◦ denotes the Hadamard product, **N** = *(***I** − **U***)* <sup>−</sup><sup>1</sup> is the fundamental matrix of the Markov chain and **Z** is a matrix that selects the living states. From these moment vectors we can calculate all the statistics of LRO. In addition, the full sensitivity analysis, calculating the derivatives of any of the moments of LRO to any parameters affecting any of the transition, mortality, or reward matrices, has been presented by van Daalen and Caswell (2017).

One of the most significant findings of this line of research has been that, in many cases, individual stochasticity can account for most or all of the observed phenotypic variance in LRO (Steiner and Tuljapurkar 2012; van Daalen and Caswell 2017). It appears that the contribution of stochasticity to variance in lifetime reproductive output has been underappreciated.

#### **5.5 Variable and Stochastic Environments**

The variance due to individual stochasticity can be examined in the case of variable environments (Caswell 2006; Tuljapurkar and Horvitz 2006; Horvitz and Tuljapurkar 2008; see also Chap. 8). Several cases can be considered:


See Tuljapurkar (1990) for a thorough discussion of types of stochastic environments.

When studying variable environments, it is important to distinguish *period* and *cohort* calculations. Period calculations are based on the vital rates in a given year. They describe the results of the hypothetical situation where the conditions of year *t* are maintained indefinitely, and compare those to the results for conditions in year *t* + 1, etc. Period calculations are a way to summarize the effects of changing environment. But an individual born in year *t* does not live its life under the conditions of year *t*. It spends its first year of life under the conditions in year *t*, its second year under the conditions of year *t* + 1, and so on. Results calculated in this way are called *cohort* calculations, because they describe a cohort born in year *t* and living through the environmental sequence starting then. Period-specific calculations are easy; simply apply the time-invariant calculation to the vital rates of each year and tabulate the results. Cohort calculations, however, must account for all the possible environmental sequences through which a cohort may pass. Caswell (2006) and Tuljapurkar and Horvitz (2006) independently introduced two different, complementary approaches to doing so. I will present the former approach here.

#### *5.5.1 A Model for Variable Environments*

In a variable environment, the transient matrix **U** is a time-varying matrix **U***(t)*. We can define a fundamental matrix by

$$\mathbf{N} = \mathbf{I} + \mathbf{U}(0) + \mathbf{U}(1)\mathbf{U}(0) + \mathbf{U}(2)\mathbf{U}(1)\mathbf{U}(0) + \cdots \tag{5.60}$$

The *(i, j )* element of **N** is the expected occupancy time in transient state *i* by an individual starting in transient state *j* at time 0, and experiencing the specific sequence of environments **U***(*0*),* **U***(*1*), . . .*. Thus there will be a different matrix **N** for each possible environmental sequence.

Tuljapurkar and Horvitz (2006), whose paper I highly recommend, work directly from (5.60) to develop the means and variances of **N**, *η*, and survivorship, in periodic, iid, and Markovian environments. Here, we consider an approach in which an individual is jointly classified by stage and environment, using the vecpermutation model developed by Hunter and Caswell (2005b).

Suppose that there are *q* environmental states = 1*,...,q* and *s* stages, *g* = 1*,...,s*. Corresponding to environment *i* is a *s* × *s* transient matrix **U***i*. Assemble the matrices **U***<sup>i</sup>* into a block-diagonal matrix

$$\mathbf{U} = \begin{pmatrix} \mathbf{U}\_{\mathrm{l}} \\ & \ddots \\ & & \mathbf{U}\_{q} \end{pmatrix} \tag{5.61}$$

of dimension *sq* × *sq*.

The transitions among environmental states are defined by a *q* × *q* columnstochastic matrix **D**. Use the matrix **D** to construct a block-diagonal environmental transition matrix

$$\mathbb{D} = \begin{pmatrix} \mathbf{D} \ 0 \ \cdots \ 0 \\ 0 \ \mathbf{D} \ \cdots \ 0 \\ \cdot \\ \cdot \\ 0 \ 0 \ \cdots \ \mathbf{D} \end{pmatrix} \tag{5.62}$$

of dimension *sq* × *sq*.

Suppose that there are 4 environmental states. In an aperiodic deterministic environment,

$$\mathbf{D} = \begin{pmatrix} 0 \ 0 \ 0 \ 0 \\ 1 \ 0 \ 0 \ 0 \\ 0 \ 1 \ 0 \ 0 \\ 0 \ 0 \ 1 \ 1 \end{pmatrix} \tag{5.63}$$

That is, the environment moves deterministically from state 1 to state 2 to state 3 to state 4. Setting *d*<sup>44</sup> = 1 solves the problem of what to do at the end of the sequence, by the (possibly satisfactory) trick of letting the final state repeat indefinitely. In a periodic environment,

$$\mathbf{D} = \begin{pmatrix} 0 \ 0 \ 0 \ 1 \\ 1 \ 0 \ 0 \ 0 \\ 0 \ 1 \ 0 \ 0 \\ 0 \ 0 \ 1 \ 0 \end{pmatrix} \tag{5.64}$$

In an iid environment in which environment *i* occurs with probability *πi*,

$$\mathbf{D} = \begin{pmatrix} \pi\_1 \ \pi\_1 \ \pi\_1 \ \pi\_1 \ \pi\_1 \\\\ \pi\_2 \ \pi\_2 \ \pi\_2 \ \pi\_2 \\\\ \pi\_3 \ \pi\_3 \ \pi\_3 \ \pi\_3 \\\\ \pi\_4 \ \pi\_4 \ \pi\_4 \ \pi\_4 \ \pi\_4 \end{pmatrix} \tag{5.65}$$

In a Markovian environment, **D** is a column stochastic transition matrix describing the transition probabilities. I will assume that the environmental Markov chain is ergodic, with a stationary probability distribution denoted by *π*. This gives the longterm frequency of occurrence of each environmental state.

The state of the cohort can be specified by a matrix **X**, of dimension *s* × *q*, with rows corresponding to stages and columns to environments, and where *xij (t)* is the expected number of individuals in stage *i* and environmental state *j* at time *t*.

$$\mathbf{X}(t) = \begin{pmatrix} \mathbf{x}\_{11} \ \cdots \ \mathbf{x}\_{1q} \\ \vdots & \vdots \\ \mathbf{x}\_{s1} \ \cdots \ \mathbf{x}\_{sq} \end{pmatrix} \tag{5.66}$$

We rearrange **X** into a vector by applying the vec operator to **X**T,

$$\mathbf{vec}\,\mathbf{X}^{\mathsf{T}} = \left(\mathbf{x}\_{11}\,\cdots\,\mathbf{x}\_{1q}\,\middle|\,\cdots\,\middle|\,\mathbf{x}\_{s1}\,\cdots\,\mathbf{x}\_{sq}\right)^{\mathsf{T}}\tag{5.67}$$

The first block of entries gives stage 1 individuals in environments 1 through *q*. The second block gives stage 2 individuals in environments 1 through *q*, and so on.

To describe the dynamics of the cohort, suppose that individuals first move among stages, according to the vital rates determined by the current environment, and then the environment changes to a new state according to **D**. Then

$$\text{vec}^{\mathsf{T}}\mathbf{X}(t+1) = \mathbb{D}\,\mathbf{K}\_{s,q} \,\mathbb{U}\,\mathbf{K}\_{s,q}^{\mathsf{T}}\,\text{vec}^{\mathsf{T}}\mathbf{X}(t) \tag{5.68}$$

The matrix **K***s,q* is the vec-permutation matrix (Henderson and Searle 1981; Hunter and Caswell 2005b), commutation matrix (Magnus and Neudecker 1979), which permutes the entries of a vector so that

$$\mathbf{vec}^{\mathsf{T}} \mathbf{X} = \mathbf{K}\_{s,q} \mathbf{vec} \mathbf{X} \tag{5.69}$$

(see Sect. 2.2.3). Like all permutation matrices, its transpose is also its inverse. Its role here is to rearrange the population vector into a form appropriate for multiplication by the block-diagonal matrices B and D.

Working from right to left, (5.68) first rearranges the vector, then applies the block-transition matrix U, then reverses the rearrangement of the vector, and finally applies the environmental transition block matrix D to obtain the expected cohort at *t* + 1. This gives a transition matrix for the joint process,

$$
\widetilde{\mathbf{U}} = \mathbb{D}\,\mathbf{K}\_{s,q} \,\,\mathbb{U}\,\,\mathbf{K}\_{s,q}^{\sf T} \tag{5.70}
$$

that incorporates the demographic transitions within each environment and the patterns of time variation among environments.5 Here and in what follows, the tilde distinguishes the matrix from the environment-specific matrices.

Matrices of similar form, but not using this formalism, were introduced by Horvitz to study populations in habitat patches where the habitat patches change state over time, for example in recovering from disturbance (Horvitz and Schemske 1986; Pascarella and Horvitz 1998). Horvitz introduced the term "megamatrix" to describe these models. A megamatrix, in the sense of Horvitz, is a special case of (5.70) when the population is classified by stages within environmental states, the demographic matrices are applied first, and the environmental transition matrices **D***<sup>i</sup>* are identical for all stages, as is the case in (5.62).

<sup>5</sup>Note that (5.68) computes the *expected* population at *<sup>t</sup>* <sup>+</sup> 1 from the *expected* population at *<sup>t</sup>*. It might be tempting to do this with the projection matrix A and use the eigenvalues of **A**˜ to calculate the stochastic population growth rate. However, this would give the growth rate of the mean population, but not the stochastic growth rate (which is always less than or equal to the growth rate of the mean population). For calculations such as moments of longevity, which are explicitly properties of the expected population, the difference does not arise.

#### *5.5.2 The Fundamental Matrix*

Since (**<sup>U</sup>** is the transient matrix of an absorbing Markov chain, the fundamental matrix in the time-varying environment is

$$\widetilde{\mathbf{N}} = \left(\mathbf{I}\_{sp} - \widetilde{\mathbf{U}}\right)^{-1} \tag{5.71}$$

The elements of (**<sup>N</sup>** give the expected occupancy times in each stage, in each environment, as a function of the starting stage and starting environment.

**Notation alert** Developing a complete system of notation for (**<sup>N</sup>** would obscure more than it would clarify. Pictures can help. As I present the fundamental matrix and some of the properties calculated from it, I will use diagrams for a simple case with three stages and two environments. I will often indicate the dimension of matrices and vectors with subscripts. I will use *g* to denote stages (*g* = 1*,* 2*,...,s*) and to denote environments ( <sup>=</sup> <sup>1</sup>*,...,q*). I will use superscripts on (**<sup>N</sup>** and quantities derived from it, to distinguish different ways of combining information across environmental states (see Table 5.1).

Recall that in a constant environment, *νij* was the number of visits to stage *i*, starting in stage *j* . Now we must consider the visits to stage *i* in environment , starting in stage *j* and environment 0, so we write

$$\mathbf{N} = E\left(\upsilon\_{lj,\epsilon}|\epsilon\_0\right) \tag{5.72}$$

**Table 5.1** Superscript notation for time-varying models. The tilde indicates quantities calculated from the complete transient matrix (**<sup>U</sup>** in (5.70). Occupancy and times to absorbtion depend on the initial and final demographic and environmental states. The superscripts (‡*,* §*,* ♥) indicate choices of summing and averaging over the environmental states. The superscripts are shown here for the fundamental matrix (**<sup>N</sup>**


The structure of (**<sup>N</sup>** when *<sup>s</sup>* <sup>=</sup> 3 and *<sup>q</sup>* <sup>=</sup> 2 is

$$\begin{array}{c c c c} & g = 1 & g = 2 & g = 3 \\ & \epsilon\_0 = 1 \ \epsilon\_0 = 2 \ \epsilon\_0 = 1 \ \epsilon\_0 = 2 \ \epsilon\_0 = 1 \ \epsilon\_0 = 2 \\ g = 1 \ \epsilon = 1 \ \boxed{1} \\ \widetilde{\mathbf{N}}\_{sq \times sq}: & \epsilon = 2 \\ g = 2 \ \epsilon = 1 \\ & \epsilon = 2 \\ g = 3 \ \epsilon = 1 \\ & \epsilon = 2 \end{array}$$

From (**<sup>N</sup>** we can obtain the expected occupancy time in each stage, regardless of the environment in which those visits occur, by aggregating rows. The resulting matrix (**N**‡ is

$$\begin{aligned} \widetilde{\mathbf{N}}^{\ddagger} &= E\left(\boldsymbol{\nu}\_{lj}|\epsilon\_0\right) \\ &= \left(\mathbf{I}\_s \otimes \mathbf{1}\_{q \times 1}^{\mathsf{T}}\right) \widetilde{\mathbf{N}} \end{aligned} \tag{5.73}$$

where **<sup>1</sup>***q*×<sup>1</sup> is a vector of ones. The structure of (**N**‡ is

$$\begin{array}{c c c c} & \text{g} = 1 & \text{g} = 2 & \text{g} = 3\\ \widetilde{\mathbf{N}}\_{s \times sq}^{+}: & \text{g} = 1 & \begin{array}{c} \epsilon\_{0} = 1 \ \epsilon\_{0} = 2 \ \epsilon\_{0} = 1 \ \epsilon\_{0} = 2\\ \hline \\ \mathbf{g} = 2 & \\ \mathbf{g} = 3 \end{array} \\ \end{array}$$

If it is useful to group stages within initial environments, rather than grouping environments within stages, (**N**‡ can be rearranged as

$$
\widetilde{\mathbf{N}}^{\ddagger\ddagger} = \widetilde{\mathbf{N}}^{\ddagger} \, \mathbf{K}\_{s,q} \tag{5.74}
$$

with the structure

$$\begin{array}{c c c} & \epsilon\_0 = 1 & \epsilon\_0 = 2 \\ \widetilde{\mathbf{N}}\_{s \times sq}^{\pm \mp}: & \mathbf{g} = 1 & \mathbf{g} = 3 \text{ g} = 1 \text{ g} = 2 \text{ g} = 3 \\ & \mathbf{g} = 2 & & \\ & \mathbf{g} = 3 & & \\ \end{array}$$

The matrices (**N**‡ and (**N**‡‡ both display expected occupancy of each stage as a function of initial state and environment. To describe the fates of individuals without specifying their initial environment, we take an expectation over the stationary distribution *π* of initial environments. This gives

$$
\widetilde{\mathbf{N}}^{\ $} = E\left[\nu\_{lj,\epsilon}\right]
$ 

$$
= \widetilde{\mathbf{N}}\left(\mathbf{I}\_{\$$
} \otimes \boldsymbol{\pi}\right) \tag{5.75}
$$

The structure of (**N**§ is

$$\begin{array}{c} \epsilon\_{0} = \bar{\epsilon} \\ \text{g} = 1 \\ \widetilde{\mathbf{N}}\_{sq \times s}^{\\$}: \begin{array}{ll} \text{g} = 1 \ \text{g} = 2 \ \text{g} = 3 \\ \epsilon = 2 \\ \epsilon = 2 \\ \epsilon = 1 \\ \epsilon = 2 \\ \epsilon = 1 \\ \epsilon = 2 \\ \epsilon = 2 \end{array} \end{array} \end{array} \rightlef}$$

The rows of (**N**§ can be rearranged to display stages within environments, giving

$$
\widetilde{\mathbf{N}}^{\\$\\$} = \mathbf{K}\_{s,q}^{\sf T} \,\widetilde{\mathbf{N}}^{\\$} \,\tag{5.76}
$$

with the structure

$$\begin{array}{c} \epsilon\_0 = \overline{\epsilon} \\ \epsilon = 1 \\ \widetilde{\mathbf{N}}\_{sq \times s}^{\\$}: \\ \epsilon = 2 \\ \epsilon = 2 \\ \mathbf{g} = \mathbf{2} \\ \mathbf{g} = \mathbf{3} \end{array} \qquad \begin{array}{c} \epsilon\_0 = \overline{\epsilon} \\ \begin{array}{c} \mathbf{g} = 1 \ \mathbf{g} = 2 \ \mathbf{g} = 3 \\ \mathbf{g} = 2 \\ \mathbf{g} = 3 \\ \mathbf{g} = 1 \\ \mathbf{g} = 2 \\ \mathbf{g} = 3 \end{array} \end{array}$$

Finally, aggregating over destination environments *and* averaging over initial environments gives a matrix containing the expected occupancy of stages as a function of initial stage, averaged over environments

$$
\widetilde{\mathbf{N}}^{\odot} = E\left[\nu\_{lj}\right]
$$

$$
= \left(\mathbf{I}\_s \otimes \mathbf{1}\_{q \times 1}^{\mathsf{T}}\right) \widetilde{\mathbf{N}}^{\prime} \left(\mathbf{I}\_s \otimes \boldsymbol{\pi}\right) \tag{5.77}
$$

The structure of (**N**♥ is

$$\begin{array}{c} \epsilon\_0 = \bar{\epsilon} \\ \mathbf{\widetilde{N}}\_{s \times s}^{\odot} \colon \begin{array}{l} \epsilon\_0 = \bar{\epsilon} \\ \mathbf{g} = 1 \ \mathbf{g} = 2 \ \mathbf{g} = 3 \\ \hline \\ \mathbf{g} = 2 \\ \mathbf{g} = 3 \end{array} \\ \hline \end{array}$$

The matrix (**N**♥, obtained by the simple calculation (5.77), is "the" fundamental matrix for the variable environment. It could be compared directly to the fundamental matrix in a constant environment (e.g., the environment defined by one of the environmental states).

#### *5.5.3 Longevity in a Variable Environment*

Life expectancy, as a function of initial stage and initial environment is obtained by summing the columns of (**N**,

$$\begin{aligned} E\left(\widetilde{\boldsymbol{\eta}}^{\mathsf{T}}\right) &= E\left[\boldsymbol{\eta}^{\mathsf{T}}|\_{\ell 0}\right] \\ &= \mathbf{1}\_{sq \times 1}^{\mathsf{T}} \widetilde{\mathbf{N}} \end{aligned} \tag{5.78}$$

The structure of *E* (*η*T is

$$E\left(\widetilde{\mathfrak{p}}^{\sf T}\right) \colon \begin{array}{c} \mathfrak{g} = 1 \\ \epsilon\_0 = 1 \\ \hline \end{array} \epsilon\_0 = 2 \\ \epsilon\_0 = 1 \\ \epsilon\_0 = 2 \\ \epsilon\_0 = 1 \\ \hline \end{array} \begin{array}{c} \mathfrak{g} = 3 \\ \mathfrak{g} = 3 \\ \begin{array}{c} \mathfrak{g} = 3 \\ \end{array} \end{array}$$

Averaging this conditional life expectancy over the stationary distribution *π* of initial environments gives

$$E\left(\widetilde{\mathfrak{p}}^{\diamondsuit}\right) = E\left(\widetilde{\mathfrak{p}}\right) \left(\mathbf{I}\_{s} \otimes \mathfrak{m}\right) \tag{5.79}$$

This measure of life expectancy in a variable environment is directly comparable to *E (η)* calculated from the same life history in a constant environment.

#### **5.5.3.1 Variance in Longevity**

In a constant environment, the variance among individuals in longevity is due to individual stochasticity. In a time-varying environment, the variance contains an additional component due to differences among individuals as a function of their environment at birth. Applying (5.27) to (**<sup>N</sup>** we obtain the variances conditional on the initial environment:

$$V\left[\widetilde{\boldsymbol{\eta}}^{\sf T}|\boldsymbol{\epsilon}\_{0}\right] = E\left(\widetilde{\boldsymbol{\eta}}^{\sf T}\right)\left(2\widetilde{\mathbf{N}} - \mathbf{I}\_{sq}\right) - E\left(\widetilde{\boldsymbol{\eta}}^{\sf T}\right)\diamond E\left(\widetilde{\boldsymbol{\eta}}^{\sf T}\right) \tag{5.80}$$

As indicated by the notation, *V* (*η*T|<sup>0</sup> is a conditional variance of (*η*, given the initial environment 0. The initial environment is distributed according to the stationary distribution *π*, so the unconditional longevity *η* follows a finite mixture distribution with mixing distribution *π*.

The unconditional variance of *η*, taking account of both sources of variability, is

$$V\left[\widetilde{\boldsymbol{\eta}}^{\mathsf{T}}\right] = V\left[E(\widetilde{\boldsymbol{\eta}}^{\mathsf{T}}|\boldsymbol{\epsilon}\_{0})\right] + E\_{\pi}\left[V(\widetilde{\boldsymbol{\eta}}^{\mathsf{T}}|\boldsymbol{\epsilon}\_{0})\right] \tag{5.81}$$

where *Eπ* denotes the expectation over the stationary distribution *π* of initial environments (Rényi 1970, p. 275, Theorem 1). This can be rearranged as

$$\begin{aligned} V\left[\widetilde{\boldsymbol{\eta}}^{\sf T}\right] &= E\_{\pi}\left[\widetilde{\boldsymbol{\eta}}^{\sf T}\circ\widetilde{\boldsymbol{\eta}}^{\sf T}\right] - E\_{\pi}\left[\widetilde{\boldsymbol{\eta}}^{\sf T}\right]\diamond E\_{\pi}\left[\widetilde{\boldsymbol{\eta}}^{\sf T}\right] + E\_{\pi}\left[V(\boldsymbol{\eta}^{\sf T}|\boldsymbol{\epsilon}\_{0})\right] \\\\ &= \left[E\left(\widetilde{\boldsymbol{\eta}}^{\sf T}\right)\diamond E\left(\widetilde{\boldsymbol{\eta}}^{\sf T}\right)\right](\mathbf{I}\_{s}\otimes\mathfrak{a}) - \left[E\left(\widetilde{\boldsymbol{\eta}}^{\diamond}\right)\diamond E\left(\widetilde{\boldsymbol{\eta}}^{\diamond}\right)\right]^{\sf T} \\\\ &+ V\left[\widetilde{\boldsymbol{\eta}}^{\sf T}|\boldsymbol{\epsilon}\_{0}\right](\mathbf{I}\_{s}\otimes\mathfrak{a}) \end{aligned} \tag{5.82}$$

(e.g., Frühwirth-Schnatter 2006, p. 10). This variance decomposition has developed into a powerful tool for the analysis of heterogeneity in demography (Edwards 2011; Hartemink and Caswell 2018; Hartemink et al. 2017; Caswell et al. 2018; Jenouvrier et al. 2018).

The choice of the mixing distribution *π* is important. Hernandez-Suarez et al. (2012) present an alternative where *π* is the stationary distribution of births across environments, rather than the distribution of environments itself.

#### *5.5.4 A Time-Varying Example: Lomatium bradshawii*

*Lomatium bradshawii* is an endangered herbaceous perennial plant, found in only a few isolated populations in prairies of Oregon and Washington. These habitats were, until recent times, subject to natural and anthropogenic fires, to which *L. bradshawii* seems to have adapted. Fall-season fires increase plant size and seedling recruitment, but the effect fades within a few years. Populations in burned areas have higher growth rates and lower probabilities of extinction than unburned populations (Caswell and Kaye 2001).

A stochastic demographic model for *L. bradshawii* was developed by Caswell and Kaye (2001), Kaye et al. (2001), and Kaye and Pyke (2003) based on data from an experimental study using controlled burning. Individuals were classified into six stages based on size and reproductive status: yearlings, small and large vegetative plants, and small, medium, and large reproductive plants. The environment was classified into four states defined by fire history: the year of a fire and 1, 2, and 3+ years post-fire. Projection matrices were estimated in each environment; the example here is based on one of the two sites (Rose Prairie) in the original study. The matrices are given in Caswell and Kaye (2001).

*L. bradshawii* performs well under recently burned conditions, but less well in sites that have not been recently burned. For example, the values of *λ* are

$$\begin{array}{l c c c} \text{Years free:} & 0 & 1 & 2 & \ge 3 \\ \text{Growth rate } \lambda & : & 1.18 & 1.12 & 0.48 & 0.88 \\ \end{array}$$

Caswell and Kaye (2001) found a minimum frequency of fire (0.4–0.5) below which the stochastic growth rate was negative and the population would be unable to persist. Effects of autocorrelation were small, but positive autocorrelation reduced the stochastic growth rate.

As an example of a time-varying analysis, let us examine *L. bradshawii* in a Markovian environment. Let *f* be the long-term frequency of fire, and *ρ* the temporal autocorrelation. Then the transition matrix for environmental states is

$$\mathbf{D} = \begin{pmatrix} p & q & q & q \\ 1-p & 0 & 0 & 0 \\ 0 & 1-q & 0 & 0 \\ 0 & 0 & 1-q & 1-q \end{pmatrix} \tag{5.83}$$

where *q* = *f (*1 − *ρ)* and *p* = *ρ* + *q*.

Figure 5.7a shows the life expectancy *<sup>E</sup> (*(*η*|0*)* of *L. bradshawii* as a function of initial stage and initial environmental state, from (5.78). Life expectancy increases with the stage (size) of a plant. A seedling has its greatest life expectancy in the year of a fire, less in an environment three or more years post-fire. A large flowering plant, in contrast, has its greatest life expectancy in an environment three or more years post-fire. When the environment-dependence is averaged over the stationary distribution of environmental states, there is a smooth increase in life expectancy from ∼2*.*5 years for a seedling to 8 years for a large flowering plant (Fig. 5.7b). The standard deviation of longevity also increases with stage, in a pattern very similar to that of the expectation.

These patterns in the mean and variance of longevity (Fig. 5.7) depend on the stochastic properties of the environment—in this case, the frequency *f* and autocorrelation *ρ* of fires. Even with an environmental model this simple, the effects of *f* and *ρ* can be complicated. I know of no previous attempts to examine their effects on longevity. To do so, I calculated life expectancy with *f* = 0*.*5 for autocorrelation −1 *<ρ<* 1, and with *ρ* = 0 for fire frequency 0 *<f <* 1.

**Fig. 5.7** The expectation and standard deviation of longevity for *Lomatium bradshawii* in a stochastic fire environment. (**a**) Expected longevity conditional on initial environment (0). (**b**) Expected longevity averaged over the stationary distribution of initial environments. (**c**) The standard deviation of longevity conditional on initial environment. (**d**) The standard deviation of longevity over the stationary distribution of initial environments. The frequency of fire is 0.5 and the temporal autocorrelation *ρ* = 0*.*7

The life expectancy of early life cycle stages increases monotonically with fire frequency (Fig. 5.8a), but the life expectancy of large reproductive plants is greatest at either low or high fire frequencies. The standard deviation of longevity increases with *f* (Fig. 5.8b). As *f* → 1, the standard deviation of longevity is approximately twice the mean.

The autocorrelation of fires has little effect on the life expectancy of seedlings, but a larger effect on that of large plants. For the latter, life expectancy is maximized as *ρ* → −1 (alternating fire and non-fire years) or as *ρ* → 1 (long periods of fires

**Fig. 5.8** The expectation and standard deviation of longevity, averaged over the stationary distribution of initial environments, for *Lomatium bradshawii*, as a function of the initial stage, the fire frequency *f* , and the temporal autocorrelation *ρ*. Parameters as in Fig. 5.7. (**a**) Life expectancy (*η*♥. (**b**) Standard deviation of longevity. (**c**) Life expectancy(*η*♥. (**d**) Standard deviation of longevity

alternating with long periods without fire). The standard deviation of longevity also shows a strong U-shaped response to *ρ* for all stages. The generality of this pattern is unknown.

#### **5.6 The Importance of Individual Stochasticity**

The concept of individual stochasticity strikes to the heart of one of the most fundamental problems in population biology: the sources of variability among individuals. Heterogeneity—genuine differences among individuals—translates into differences in the age- or stage-specific vital rates to which they are subject. Heterogeneity may arise from genetics, from physiological effects, from health conditions, or from unknown causes ("frailty," "quality"). Stochasticity results from the random outcomes of probabilistic processes. Markov chains naturally treat individual trajectories (i.e., individual lives) as realizations of an underlying stochastic process, and so much of this chapter has been focused on the analysis of individual stochasticity. The distinction is particularly important in evolutionary demography, where variance in lifetime reproductive output is routinely treated as variance in fitness, or a component of fitness. See Sect. 5.4.4 for some recent work on this problem.

Individual stochasticity is an important component of demography, for both human and non-human populations. It complements environmental stochasticity (externally imposed random changes in vital rates) and demographic stochasticity (randomness in the growth of populations due to stochastic survival and reproduction) (Caswell and Vindenes 2018). Individual stochasticity reflects randomness in the pathways that individuals take through the life cycle. It expresses itself in interindividual variation in occupancy times, longevity, lifetime reproductive output, and other outcomes. The availability of methods based on Markov chains promises to change the way population biologists approach the analysis of variance among individuals (Caswell 2011; Tuljapurkar et al. 2009; Steiner and Tuljapurkar 2012; van Daalen and Caswell 2015; van Daalen and Caswell 2017).

#### **5.7 Discussion**

Taking advantage of the Markov chain formulation of the life cycle opens up a wealth of demographic information. The age-classified information extracted from a stage-classified model can form a valuable component of behavioral studies, especially if the model (like the right whale example) includes reproductive behavior as part of the life cycle structure. Longevity provides a powerful way to compare mortality schedules among species, populations, or environmental conditions, but it has been inaccessible to stage-classified analysis prior to the development of Markov chain methods. The generation time characterizes an important population time scale, with implications in conservation (IUCN Species Survival Commission 2001), but there has been no way to compute it from stage-classified models.

Stage-classified life cycles may have consequences that are not yet appreciated, but must be considered when interpreting the results. For example, any stageclassified model eventually leads to an age-independent mortality rate (Horvitz and Tuljapurkar 2008), and so is of limited use in the study of senescence. This fact has consequences for life expectancy and variance in longevity that are not well understood (at least by me). For the right whale, expected longevity at birth is 32 years with a standard deviation of 34 years. It is unlikely that there are appreciable numbers of whales alive at even one standard deviation above this mean. The high survival probability and the assumption of age-independence lead to the high standard deviation. Those of us who work with stage-classified models are accustomed to this, but discount its importance because it (often) has little effect on *λ*. It will be important to determine the stochastic consequences of simplifying assumptions in the life cycle graph.

This chapter does not begin to exhaust the information that can be extracted from the Markov chain formulation of a stage-classified model. Three examples of particular interest are the occupancy of sets of states, the problem of competing risks, and the calculation of passage times. It is often of interest to calculate the statistics of occupancy of sets of states (e.g., all reproductive classes, or all stages in some particular health condition). We have seen how to calculate the moments of the occupancy time of single states. The mean occupancy time of a set of states is the sum of the mean occupancy times of each state, but that is not true for the variance or higher moments. Roth and Caswell (2018) derived a general expression for all the statistics, and the complete distribution, of occupancy time for any set of states. If more than one absorbing state exists (e.g., death at different stages, or from different causes), then the risks of absorbtion compete, because an individual can only be absorbed (i.e., die) once. It is possible to calculate the probability of absorbtion in each state, and to explore the effects of changing one risk on the probability of experiencing another (Caswell and Ouellette 2018). Passage times refer to the time required to get from one stage to another in the life cycle. An important passage time is the birth interval: the time from one birth to the next. This can only be calculated for individuals that do reproduce a second time (otherwise the interval is infinite), and so it requires developing a chain that is conditional on successfully reaching the reproductive state (Caswell 2001). In species that produce only one or a few offspring, reproduction cannot be adjusted in response to the environment by changing offspring number, and so changes in the birth interval are particularly important in such species.

#### **A Appendix: Derivations**

This appendix contains step-by-step derivations of many of the results in this chapter, especially for sensitivities. Taking advantage of the freedom from length limits, I have tried to show the derivations step-by-step. Recall the definitions of the Hadamard product

$$\mathbf{A} \circ \mathbf{B} = \left( a\_{lj} b\_{lj} \right), \tag{5.84}$$

the Kronecker product

$$\mathbf{A} \otimes \mathbf{B} = \left( a\_{lj} \mathbf{B} \right), \tag{5.85}$$

the vec operator

$$\text{vec}\begin{pmatrix} a \ b \\ c \ d \end{pmatrix} = \begin{pmatrix} a \\ c \\ b \\ d \end{pmatrix},\tag{5.86}$$

and Roth's theorem

$$\operatorname{vec}\left(\mathbf{ABC}\right) = \left(\mathbf{C}^{\mathsf{T}} \otimes \mathbf{A}\right)\operatorname{vec}\mathbf{B}.\tag{5.87}$$

#### *A.1 Variance in Occupancy Times*

The occupancy time in transient state *i*, starting from transient state *j* , is *νij* . The matrix of variances of the *νij* is

$$\mathbf{V} = \left( V(\boldsymbol{\upsilon}\_{lj}) \right) = \left( 2\mathbf{N}\_{\rm dg} - \mathbf{I}\_s \right) \mathbf{N} - \mathbf{N} \diamond \mathbf{N} \tag{5.88}$$

(Caswell 2006, derived from Theorem 3.1 of Iosifescu 1980) where **N**dg is a matrix with the diagonal elements of **N** on its diagonal and zeros elsewhere; it can be written

$$\mathbf{N}\_{\rm dg} = \mathbf{I}\_s \diamond \mathbf{N} \tag{5.89}$$

Differentiating both sides of (5.88) gives

$$d\mathbf{V} = 2(\mathbf{I}\_s \diamond d\mathbf{N})\mathbf{N} + 2(\mathbf{I}\_s \diamond \mathbf{N})(d\mathbf{N}) - d\mathbf{N}$$

$$-(d\mathbf{N}) \diamond \mathbf{N} - \mathbf{N} \circ (d\mathbf{N})\tag{5.90}$$

The next step is to apply the vec operator to both sides. The vec of a Hadamard product can be written in two ways:

$$\operatorname{vec}\left(\mathbf{A}\diamond\mathbf{B}\right) = \mathcal{D}\left(\operatorname{vec}\mathbf{A}\right)\operatorname{vec}\mathbf{B} = \mathcal{D}\left(\operatorname{vec}\mathbf{B}\right)\operatorname{vec}\mathbf{A}.\tag{5.91}$$

Using this result and Roth's theorem (5.87) gives

$$d\text{vec}\,\mathbf{V} = 2\left(\mathbf{N}^{\mathsf{T}} \otimes \mathbf{I}\_{s}\right)\mathcal{D}\left(\text{vec}\,\mathbf{I}\_{s}\right)d\text{vec}\,\mathbf{N} + 2\left[\mathbf{I}\_{s}\otimes(\mathbf{I}\circ\mathbf{N})\right]d\text{vec}\,\mathbf{N}$$

$$-d\text{vec}\,\mathbf{N} - 2\mathcal{D}\left(\text{vec}\,\mathbf{N}\right)d\text{vec}\,\mathbf{N}\tag{5.92}$$

Factoring out *d*vec **N** and using the chain rule gives the final result

$$\frac{d\text{vec}\,\mathbf{V}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left[2\left(\mathbf{N}^{\mathsf{T}} \otimes \mathbf{I}\_{s}\right)\mathcal{D}\left(\text{vec}\,\mathbf{I}\_{s}\right) + 2\left(\mathbf{I}\_{s} \otimes \mathbf{N}\_{\text{dg}}\right)\right]$$

$$-\mathbf{I}\_{s^{2}} - 2\mathcal{D}\left(\text{vec}\,\mathbf{N}\right)\right]\frac{d\text{vec}\,\mathbf{N}}{d\boldsymbol{\theta}^{\mathsf{T}}}\tag{5.93}$$

#### *A.2 Life Expectancy*

Let *ηi* be the time to absorbtion (i.e., death) of an individual currently in stage *i*. The vector *E(η)* of expected values of the *ηi* satisfies

$$E(\mathfrak{q})^{\mathsf{T}} = \mathbf{1}^{\mathsf{T}} \mathbf{N} \tag{5.94}$$

where **1** is a vector of ones. Differentiating both sides gives

$$dE(\mathfrak{q})^{\mathsf{T}} = \mathbf{1}^{\mathsf{T}}(d\mathbf{N})\tag{5.95}$$

Applying the vec operator gives

$$dE(\eta) = \left(\mathbf{I}\_s \otimes \mathbf{1}^\mathsf{T}\right) d\mathbf{vec} \,\mathbf{N} \tag{5.96}$$

Applying the identification theorem and the chain rule, and using (5.16) for the sensitivity of the fundamental matrix, gives

$$\frac{dE(\eta)}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left(\mathbf{I}\_{s}^{\mathsf{T}} \otimes \mathbf{1}^{\mathsf{T}}\right) \left(\mathbf{N}^{\mathsf{T}} \otimes \mathbf{N}\right) \frac{d\mathbf{vec}\,\mathbf{U}}{d\boldsymbol{\theta}^{\mathsf{T}}}\tag{5.97}$$

This gives the derivative of the entire vector of life expectancies. Suppose that stage 1 corresponds to birth. The life expectancy at birth is then

$$E(\eta\_{\rm l}) = \mathbf{1}^{\mathsf{T}} \mathbf{N} \mathbf{e}\_{\rm l} \tag{5.98}$$

where **e**<sup>1</sup> is a vector with 1 in the first position and zeros elsewhere. Following the same derivation gives

$$\frac{dE(\eta\_{\text{I}})}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left(\mathbf{e}\_{\text{I}}^{\mathsf{T}} \otimes \mathbf{1}^{\mathsf{T}}\right) \left(\mathbf{N}^{\mathsf{T}} \otimes \mathbf{N}\right) \frac{d\text{vec}\,\mathbf{U}}{d\boldsymbol{\theta}^{\mathsf{T}}}\tag{5.99}$$

#### *A.3 Variance in Longevity*

The variance of the time to absorbtion satisfies

$$V(\boldsymbol{\eta})^{\mathsf{T}} = \mathbf{1}^{\mathsf{T}} \mathbf{N} \left(2\mathbf{N} - \mathbf{I}\right) - E\left(\boldsymbol{\eta}^{\mathsf{T}}\right) \diamond E\left(\boldsymbol{\eta}^{\mathsf{T}}\right) \tag{5.100}$$

(Caswell 2006, derived from Theorem 3.2 of Iosifescu 1980). Differentiating gives

$$d\boldsymbol{V}(\boldsymbol{\eta})^{\mathsf{T}} = 2\mathbf{1}^{\mathsf{T}}(d\mathbf{N})\mathbf{N} + 2\mathbf{1}^{\mathsf{T}}\mathbf{N}(d\mathbf{N}) - \mathbf{1}^{\mathsf{T}}(d\mathbf{N}) - 2E\left(\boldsymbol{\eta}^{\mathsf{T}}\right) \diamond d\boldsymbol{E}\left(\boldsymbol{\eta}^{\mathsf{T}}\right) \qquad (5.101)$$

Applying the vec operator and Roth's theorem (5.87), using (5.91) for the vec of the Hadamard product, gives

$$dV(\boldsymbol{\eta}) = \left[ 2\left( \mathbf{N}^{\mathsf{T}} \otimes \mathbf{1}^{\mathsf{T}} \right) + 2\left( \mathbf{I}\_{s} \otimes \mathbf{1}^{\mathsf{T}} \mathbf{N} \right) \right.$$

$$- \left( \mathbf{I}\_{s} \otimes \mathbf{1}^{\mathsf{T}} \right) \left[ \operatorname\*{vec} \mathbf{N} - 2 \mathcal{D} \left( E(\boldsymbol{\eta}) \right) \right] dE(\boldsymbol{\eta}) \qquad (5.102)$$

Substituting (5.96) for *dE(η)* gives

$$\begin{aligned} dV(\boldsymbol{\eta}) &= \left[ 2 \left( \mathbf{N}^{\mathsf{T}} \otimes \mathbf{1}^{\mathsf{T}} \right) + 2 \left( \mathbf{I}\_{s} \otimes \mathbf{1}^{\mathsf{T}} \mathbf{N} \right) \right. \\ &\quad - \left( \mathbf{I}\_{s} \otimes \mathbf{1}^{\mathsf{T}} \right) - 2 \mathcal{D} \left( E(\boldsymbol{\eta}) \right) \left( \mathbf{I}\_{s} \otimes \mathbf{1}^{\mathsf{T}} \right) \right] d \mathbf{vec} \, \mathbf{N} \end{aligned} \tag{5.103}$$

Using (5.16) for the sensitivity of **N**, the identification theorem, and the chain rule finally leads to

$$\begin{split} \frac{d\boldsymbol{V}(\boldsymbol{\eta})}{d\boldsymbol{\theta}^{\mathsf{T}}} &= \left[ 2\left(\mathbf{N}^{\mathsf{T}} \otimes \mathbf{1}^{\mathsf{T}}\right) + 2\left(\mathbf{I}\_{s} \otimes \mathbf{1}^{\mathsf{T}}\mathbf{N}\right) \right. \\ & \quad - \left(\mathbf{I}\_{s} \otimes \mathbf{1}^{\mathsf{T}}\right) - 2\mathcal{D}\left(\boldsymbol{E}\left(\boldsymbol{\eta}\right)\right)\left(\mathbf{I}\_{s} \otimes \mathbf{1}^{\mathsf{T}}\right) \right] \left(\mathbf{N}^{\mathsf{T}} \otimes \mathbf{N}\right) \frac{d\boldsymbol{\mathrm{vec}}\,\mathbf{U}}{d\boldsymbol{\theta}^{\mathsf{T}}} \quad (5.104) \end{split} (5.104)$$

#### *A.4 Net Reproductive Rate*

The net reproductive rate *R*<sup>0</sup> is given by the dominant eigenvalue of **FN**. Let **y** and **x** be the right and left eigenvectors, respectively, of **FN**, corresponding to *R*0. The matrix calculus version of the standard eigenvalue perturbation result (e.g., Caswell 1978) gives

$$\begin{aligned} \boldsymbol{d} \, R\_0 &= \mathbf{x}^{\mathsf{T}} \boldsymbol{d}(\mathbf{FN}) \mathbf{y} \\ &= \mathbf{x}^{\mathsf{T}} \left[ (\boldsymbol{d} \mathbf{F}) \mathbf{N} + \mathbf{F}(\boldsymbol{d} \mathbf{N}) \right] \mathbf{y} \end{aligned} \tag{5.105}$$

Applying the vec operator to both sides gives

$$d\,dR\_0 = \left(\mathbf{y}^\mathsf{T}\mathbf{N}^\mathsf{T}\otimes\mathbf{x}^\mathsf{T}\right)d\mathrm{vec}\,\mathbf{F} + \left(\mathbf{y}^\mathsf{T}\otimes\mathbf{x}^\mathsf{T}\mathbf{F}\right)d\mathrm{vec}\,\mathbf{N} \tag{5.106}$$

Applying the chain rule and the result (5.14) for *d*vec **N** gives the sensitivity of *R*<sup>0</sup> in terms of effects of the parameter vector *θ* on the fertility matrix **F** and the transient matrix **U**:

$$\frac{d\,^{d}\mathbf{R}\_{0}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left(\mathbf{y}^{\mathsf{T}}\mathbf{N}^{\mathsf{T}}\otimes\mathbf{x}^{\mathsf{T}}\right)\frac{d\mathbf{vec}\,\mathbf{F}}{d\boldsymbol{\theta}^{\mathsf{T}}} + \left(\mathbf{y}^{\mathsf{T}}\otimes\mathbf{x}^{\mathsf{T}}\mathbf{F}\right)\left(\mathbf{N}^{\mathsf{T}}\otimes\mathbf{N}\right)\frac{d\mathbf{vec}\,\mathbf{U}}{d\boldsymbol{\theta}^{\mathsf{T}}}\tag{5.107}$$

#### *A.5 Cohort Generation Time*

To derive the cohort generation time, we begin at time *t* = 0 with an individual newly born in stage *j* . This tiny cohort is described by an initial vector **e***<sup>j</sup>* . The expected survivors of this cohort at time *t* are **U***<sup>t</sup>* **e***<sup>j</sup>* . The expected offspring produced by these survivors at time *t* are **FU***<sup>t</sup>* **e***<sup>j</sup>* . Summing over the lifetime of the cohort gives a vector of expected lifetime reproduction, of all types of offspring,

$$\begin{aligned} E(\text{total offspring}) &= \sum\_{t=0}^{\infty} \mathbf{F} \mathbf{U}^{t} \mathbf{e}\_{j} \\ &= \mathbf{F} \left( \sum\_{t=0}^{\infty} \mathbf{U}^{t} \right) \mathbf{e}\_{j} \\ &= \mathbf{F} \mathbf{N} \mathbf{e}\_{j} \end{aligned} \tag{5.108}$$

Let **m***(j )(t)* be the vector of offspring production at time *t*, expressed as a *proportion* of the lifetime total of the individual starting in stage *j* . Then

$$\mathbf{m}^{(j)}(t) = \mathcal{D} \left(\mathbf{F} \mathbf{N} \mathbf{e}\_{/}\right)^{-1} \left(\mathbf{F} \mathbf{U}^{\prime} \mathbf{e}\_{/}\right) \tag{5.109}$$

If no offspring of some stage, say stage *i*, are produced, then set *m(j ) <sup>i</sup> (t)* = 0.

The cohort generation time *μ(j )* is the expectation of the distribution defined by **m***(j )(t)*:

$$\begin{split} \mu^{(j)} &= \sum\_{\boldsymbol{\chi}=\boldsymbol{0}}^{\infty} \boldsymbol{x} \mathbf{m}^{(j)}(\boldsymbol{\chi}) \\ &= \sum\_{\boldsymbol{\chi}} \mathcal{D} \left( \mathbf{F} \mathbf{N} \mathbf{e}\_{j} \right)^{-1} \boldsymbol{x} \mathbf{F} \mathbf{U}^{\boldsymbol{\chi}} \mathbf{e}\_{j} \\ &= \mathcal{D} \left( \mathbf{F} \mathbf{N} \mathbf{e}\_{j} \right)^{-1} \mathbf{F} \left( \sum\_{\boldsymbol{\chi}} \boldsymbol{x} \mathbf{U}^{\boldsymbol{\chi}} \right) \mathbf{e}\_{j} . \end{split} \tag{5.110}$$

The summation can be simplified

$$\sum\_{\mathbf{x}} x \mathbf{U}^{\mathbf{x}} = \mathbf{0} + \mathbf{U} + 2\mathbf{U}^2 + 3\mathbf{U}^3 + \cdots$$

$$= \mathbf{U} \left[ \mathbf{0} + \mathbf{U} + 2\mathbf{U}^2 + \cdots + \mathbf{I} + \mathbf{U} + \mathbf{U}^2 + \cdots \right]$$

$$= \mathbf{U} \left[ \mathbf{N} + \sum\_{\mathbf{x}} x \mathbf{U}^{\mathbf{x}} \right]. \tag{5.111}$$

Solving this gives

$$\sum\_{\mathbf{x}} \mathbf{x} \mathbf{U}^{\mathbf{x}} = \mathbf{N} \mathbf{U} \mathbf{N}. \tag{5.112}$$

Putting all the pieces together gives the generation time

$$
\mu^{(j)} = \mathcal{D} \left( \mathbf{FNe}\_f \right)^{-1} \mathbf{FNUNe}\_f. \tag{5.113}
$$

#### **A.5.1 Sensitivity of Generation Time**

To differentiate (5.113) may seem complicated. To make life easier, define some notation,

$$\mathbf{X} = \mathcal{D} \begin{pmatrix} \mathbf{F} \mathbf{N} \mathbf{e}\_j \end{pmatrix} \tag{5.114}$$

$$\mathbf{r} = \mathbf{F} \mathbf{N} \mathbf{U} \mathbf{N} \mathbf{e}\_f \tag{5.115}$$

in terms of which (5.113)

$$
\mu^{(j)} = \mathbf{X}^{-1} \mathbf{r}.\tag{5.116}
$$

Differentiate,

$$d\mu^{(j)} = d\left(\mathbf{X}^{-1}\right)\mathbf{r} + \mathbf{X}^{-1}d\mathbf{r} \tag{5.117}$$

and apply the vec operator

$$d\mu^{(j)} = \left(\mathbf{r}^{\mathsf{T}} \otimes \mathbf{I}\right) d\text{vec}\,\mathbf{X}^{-1} + \mathbf{X}^{-1} d\text{vec}\,\mathbf{r}.\tag{5.118}$$

The same steps that led to Eq. (5.14) for *d*vec **N**, and noting that **X** is symmetric, leads to

$$d\text{vec}\,\mathbf{X}^{-1} = -\left(\mathbf{X}^{-1}\otimes\mathbf{X}^{-1}\right)d\text{vec}\,\mathbf{X}.\tag{5.119}$$

The differential of vec **X** is obtained by writing

$$\mathbf{X} = \mathbf{I} \circ \left(\mathbf{F} \mathbf{N} \mathbf{e}\_f \mathbf{1}^\mathsf{T}\right). \tag{5.120}$$

Differentiating and using the rule (5.91) for the vec of a Hadamard product gives

$$d\text{vec}\,\mathbf{X} = \mathcal{D}\left(\text{vec}\,\mathbf{I}\right) \left[ \left(\mathbf{1}\mathbf{e}\_{j}^{\mathsf{T}}\mathbf{N}^{\mathsf{T}}\otimes\mathbf{I}\right)d\text{vec}\,\mathbf{F} + \left(\mathbf{1}\mathbf{e}\_{j}\otimes\mathbf{F}\right)d\text{vec}\,\mathbf{N} \right] \tag{5.121}$$

Differentiating **r** and applying the vec operator gives

$$d\text{vec}\,\mathbf{r} = \left[\left(\mathbf{N}\mathbf{U}\mathbf{N}\mathbf{e}\_{j}\right)^{\mathsf{T}} \otimes \mathbf{I}\right]d\text{vec}\,\mathbf{F} + \left[\left(\mathbf{U}\mathbf{N}\mathbf{e}\_{j}\right)^{\mathsf{T}} \otimes \mathbf{F}\right]d\text{vec}\,\mathbf{N}$$

$$+ \left[\left(\mathbf{N}\mathbf{e}\_{j}\right)^{\mathsf{T}} \otimes \mathbf{F}\mathbf{N}\right]d\text{vec}\,\mathbf{U} + \left[\mathbf{e}\_{j}^{\mathsf{T}} \otimes \mathbf{f}\mathbf{N}\mathbf{U}\right]d\text{vec}\,\mathbf{N} \qquad(5.122)$$

Whew!

Finally, substituting (5.119), (5.121) and (5.122) into (5.117), we obtain

$$\begin{split} \frac{d\boldsymbol{\mu}^{(j)}}{d\boldsymbol{\theta}^{\mathsf{T}}} &= -\left(\mathbf{r}^{\mathsf{T}} \otimes \mathbf{I}\right) \left(\mathbf{X}^{-1} \otimes \mathbf{X}^{-1}\right) \mathcal{D}\left(\text{vec}\,\mathbf{I}\right) \\ &\times \left[\left(\mathbf{1}\mathbf{e}\_{j}^{\mathsf{T}}\mathbf{N}^{\mathsf{T}} \otimes \mathbf{I}\right) \frac{d\operatorname{vec}\,\mathbf{F}}{d\boldsymbol{\theta}^{\mathsf{T}}} + \left(\mathbf{1}\mathbf{e}\_{j} \otimes \mathbf{F}\right) \frac{d\operatorname{vec}\,\mathbf{N}}{d\boldsymbol{\theta}^{\mathsf{T}}}\right] \\ &+ \left\{\left[\left(\mathbf{N}\mathbf{U}\mathbf{N}\mathbf{e}\_{j}\right)^{\mathsf{T}} \otimes \mathbf{I}\right] \frac{d\operatorname{vec}\,\mathbf{F}}{d\boldsymbol{\theta}^{\mathsf{T}}} + \left[\left(\mathbf{U}\mathbf{N}\mathbf{e}\_{j}\right)^{\mathsf{T}} \otimes \mathbf{F}\right] \frac{d\operatorname{vec}\,\mathbf{N}}{d\boldsymbol{\theta}^{\mathsf{T}}} \\ &+ \left[\left(\mathbf{N}\mathbf{e}\_{j}\right)^{\mathsf{T}} \otimes \mathbf{F}\mathbf{N}\right] \frac{d\operatorname{vec}\,\mathbf{U}}{d\boldsymbol{\theta}^{\mathsf{T}}} + \left[\mathbf{e}\_{j}^{\mathsf{T}} \otimes \mathbf{F}\mathbf{N}\mathbf{U}\right] \frac{d\operatorname{vec}\,\mathbf{N}}{d\boldsymbol{\theta}^{\mathsf{T}}}\right\}. \end{split} (5.123)$$

This may be an impressive formula, but it is straightforward to compute, given the derivatives of **U**, **F**, and **N** with respect to *θ*.

#### **Bibliography**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 6 Age × Stage-Classified Models**

#### **6.1 Introduction**

The first step in developing any kind of structured population model is choosing one or more variables in terms of which to describe the population structure. The job of these *i*-state variables is to encapsulate all the information about the past experience of an individual that is relevant to its future behavior (Metz and Diekmann 1986; Caswell 2001). Classical demography (for both humans and for non-human animals and plants) uses age as a *i*-state, but other, more biologically relevant criteria (e.g., size, developmental stage, parity, physiological condition, etc.) are now widely used in ecology, with age-classified models viewed as a special case.

However, it has long been recognized that cases exist where it is important to classify individuals by *both* age and stage.


This chapter is modified, under a Creative Commons Attribution License, from Caswell, H. 2012. Matrix models and sensitivity analysis of populations classified by age and stage: a vecpermutation matrix approach. Theoretical Ecology 5:403–417.

which the stage variable describes spatial location (e.g., Rogers 1966; Lebreton 1996). Models that combine age and some measure of health or disability status are an important part of health demography (e.g., Willekens 2014; Peeters et al. 2002; Wu et al. 2006; Zhou et al. 2016).

This chapter presents a model framework in which individuals are classified by age and stage, using the vec-permutation matrix approach (so-called for the role that the vec-permutation matrix plays in rearranging age and stage categories in the population vector). This formalism was introduced by Hunter and Caswell (2005) for populations classified by stage and location, was used in Chap. 5 to classify individuals by stage and environmental state; it has also been applied to stage and infection status (Klepac and Caswell 2011), stage and age (Caswell 2012; Caswell et al. 2018), and age and frailty (Caswell 2014). Megamatrix models (e.g., Pascarella and Horvitz 1998; Horvitz and Tuljapurkar 2008) can be written using this approach, as can block-structured multiregional models (e.g., Rogers 1975; Lebreton 1996). Matrix models can describe both population dynamics and cohort dynamics. Population dynamics (population growth, age and stage structure, reproductive value) depend on both the transitions of extant individuals and the production of new individuals by reproduction. In contrast, cohort dynamics (survivorship, life expectancy, age at death, generation time) depend only on the fates of already existing individuals. This chapter describes both kinds of analysis. For a more complete review and treatment, see Caswell et al. (2018).

#### **6.2 Model Construction**

The construction and analysis of these models requires a number of different matrices and operators (some of the notation is collected in Table 6.1). Individuals are classified into stages 1*,...,s* and age classes 1*,...,ω*. The model treats the processes of moving among stages and moving among age classes as alternating. First, stage-specific demography operates to move individuals among stages and to produce new offspring, with rates appropriate to their ages. Then aging acts to move individuals to the next older age, and the process repeats.

Define a stage-classified projection matrix **A***i*, of dimension *s* × *s*, for each age class, *i* = 1*,...,ω*. Decompose **A***<sup>i</sup>* into

$$\mathbf{A}\_{l} = \mathbf{U}\_{l} + \mathbf{F}\_{l} \tag{6.1}$$

where **U***<sup>i</sup>* contains the transition probabilities of extant individuals and **F***<sup>i</sup>* describes the generation of new individuals by reproduction.


**Table 6.1** Mathematical notation used in this chapter. Dimensions are shown, where relevant, for matrices and vectors; *s* denotes the number of stages and *ω* the number of age classes

Aging is described by two matrices, each of dimension *ω* × *ω* (shown here for 3 × 3, but easily generalized),

$$\mathbf{D}\_{\rm U} = \begin{pmatrix} 0 \ 0 \ 0 \\ 1 \ 0 \ 0 \\ 0 \ 1 \ 1 \end{pmatrix} \qquad \text{dimension } \boldsymbol{\omega} \times \boldsymbol{\omega} \tag{6.2}$$

$$\mathbf{D}\_{\rm F} = \begin{pmatrix} 1 & 1 & 1 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{pmatrix} \qquad w \times w \tag{6.3}$$

The matrix **D**<sup>U</sup> applies to extant individuals; such an individual advances to the next age class. I have set the *(ω, ω)* entry of **D**<sup>U</sup> to 1, so that the last age class contains individuals of age *ω* and older. If this entry were set to 0, all individuals in the last age class would die. The matrix **D**<sup>F</sup> applies to individuals newly created by reproduction; such newborn individuals are placed in the first age class, regardless of the age of their parents.

Using the matrices **A***i*, **U***i*, **F***i*, **D**U, and **D**F, construct block-diagonal matrices, each of dimension *sω* × *sω*. For example,

$$\mathbb{A} = \begin{pmatrix} \mathbf{A}\_1 \\ & \ddots \\ & & \mathbf{A}\_{\boldsymbol{\omega}} \end{pmatrix} \tag{6.4}$$

with similar structures for U, F, DU, and DF. These block-diagonal matrices can be written

$$\mathbb{A} = \sum\_{l=1}^{\alpha} \left( \mathbf{E}\_{li} \otimes \mathbf{A}\_{l} \right) \tag{6.5}$$

$$\mathbf{U} = \sum\_{l=1}^{\alpha} \left( \mathbf{E}\_{li} \otimes \mathbf{U}\_{l} \right) \tag{6.6}$$

$$\mathbb{F} = \sum\_{l=1}^{\alpha} \left( \mathbf{E}\_{li} \otimes \mathbf{F}\_{l} \right) \tag{6.7}$$

$$\mathbb{D}\_{\mathrm{U}} = \mathbf{I}\_{s} \otimes \mathbf{D}\_{\mathrm{U}} \tag{6.8}$$

$$\mathbb{D}\_{\rm F} = \mathbf{I}\_s \otimes \mathbf{D}\_{\rm F} \tag{6.9}$$

where **E***ii* is of dimension *ω* × *ω*.

If the demography is strictly stage-dependent, so that **A***<sup>i</sup>* = **A**, for *i* = 1*,...,ω*, then the block-diagonal matrices A, F, and U reduce to, e.g.,

$$\mathbf{A} = \mathbf{I}\_{\alpha} \otimes \mathbf{A} \tag{6.10}$$

with corresponding expressions for F and U.

The state of the population at time *t* could be described by a 2-dimensional array

$$\mathcal{N}(t) = \begin{pmatrix} n\_{11} \cdots \cdots n\_{1\omega} \\ \vdots & \vdots \\ n\_{s1} \cdots \cdots n\_{s\omega} \end{pmatrix} (t) \qquad s \times \omega \tag{6.11}$$

where rows correspond to stages and columns to age classes. However, such a 2 dimensional array cannot be projected directly; instead, it is transformed to a vector,

$$\mathbf{n}(t) = \text{vec}\mathcal{N}(t) = \begin{pmatrix} n\_{11} \\ \vdots \\ n\_{s1} \\ \hline \\ \vdots \\ n\_{1w} \\ \vdots \\ n\_{sw} \end{pmatrix} (t) \qquad s\omega \times 1 \tag{6.12}$$

using the vec operator, which stacks the columns of the matrix one above the next. The vector **n***(t)* created in this way contains the stages arranged within age classes. An alternative configuration, with ages arranged within stages, is obtained by applying the vec operator to *<sup>N</sup>*T:

$$\text{vec}\,\mathcal{N}^{\mathsf{T}}(t) = \begin{pmatrix} n\_{11} \\ \vdots \\ n\_{1\omega} \\ \hline \vdots \\ \hline n\_{s1} \\ \hline n\_{s1} \\ \vdots \\ \hline n\_{s\omega} \end{pmatrix} (t) \qquad s\omega \times 1 \tag{6.13}$$

The two vectors vec *<sup>N</sup>* and vec *<sup>N</sup>*<sup>T</sup> are related by the vec-permutation matrix, or commutation matrix, **K**, (Henderson and Searle 1981),

$$\text{vec}\,\mathcal{N}^{\mathsf{T}} = \mathbf{K}\_{\mathsf{s},\omega}\text{vec}\,\mathcal{N} \tag{6.14}$$

(see Sect. 2.2.3). Where no confusion seems likely to arise, we will suppress the subscripts and write **<sup>K</sup>***s,ω* as **<sup>K</sup>**. As with any permutation matrix, **<sup>K</sup>**<sup>T</sup> <sup>=</sup> **<sup>K</sup>**−1.

The goal of the model is to project the age-stage vector **n** = vec *N* from *t* to *t* + 1. The complete projection is given by

$$\mathbf{n}(t+1) = \left(\mathbf{K}^{\mathsf{T}} \mathbb{D}\_{\mathbf{U}} \mathbf{K} \mathbb{U} + \mathbf{K}^{\mathsf{T}} \mathbb{D}\_{\mathbf{F}} \mathbf{K} \mathbb{F}\right) \mathbf{n}(t) \tag{6.15}$$

This deserves some explanation. Consider the first term on the right hand side, **K**TDU**K**U. Reading from right to left, it first operates on the vector **n***(t)* with the block diagonal matrix U, which moves surviving extant individuals among stages without changing their age. Then the resulting vector is rearranged by the vecpermutation matrix **K** to group individuals by age classes within each stage. The block diagonal matrix D<sup>U</sup> then moves each surviving individual to the next older age class. Finally, **K**<sup>T</sup> rearranges the vector back to the stage-within-age arrangement of **n***(t)*.

The second term in (6.15), **K**TDF**K**F, carries out a similar sequence of transformations for the generation of new individuals. First, newborn individuals are produced according to the block-diagonal fertility matrix F. The resulting vector is rearranged by the vec-permutation matrix, and then the matrix D<sup>F</sup> places all the newborn individuals into the first age class. Finally, **K**<sup>T</sup> rearranges the vector to the stage-within-age arrangement.

I will write the age × stage projection matrix in (6.15) as

$$\tilde{\mathbf{A}} = \left(\mathbf{K}^{\mathsf{T}} \mathbb{D}\_{\mathbf{U}} \mathbf{K} \mathbf{U} + \mathbf{K}^{\mathsf{T}} \mathbb{D}\_{\mathbf{F}} \mathbf{K} \mathbb{F}\right) \tag{6.16}$$

$$= \left(\tilde{\mathbf{U}} + \tilde{\mathbf{F}}\right) \tag{6.17}$$

The matrices **A**˜ , **U**˜ , and **F**˜ that operate on the age-stage vector **n** are denoted with a tilde (**A**˜ , **U**˜ , **F**˜); these matrices define the age × stage-classified model and can be subjected to all the usual demographic analyses.

#### **6.3 Sensitivity Analysis**

Age-stage models pose particular challenges for perturbation analysis, because interest naturally focuses on changes in the matrices **F***<sup>i</sup>* and **U***<sup>i</sup>* (*i* = 1*,...,ω*), which are deeply embedded within **F**˜, **U**˜ , and **A**˜ .

Consider a generic dependent variable *ξ* , which is a scalar- or vector-valued function of **A**˜ . In the examples to follow, *ξ* will be either the population growth rate *λ* or the joint distribution of age and stage at death in a cohort, but it could be any variable calculated from **A**˜ . Let *θ* be a vector of parameters; these could be entries of the matrices, or lower-level parameters determining those entries. The goal of perturbation analysis is to obtain the derivative of *ξ* with respect to *θ*,

$$\frac{d\boldsymbol{\xi}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \frac{d\boldsymbol{\xi}}{d\mathbf{v}\mathbf{c}\,^{\mathsf{T}}\tilde{\mathbf{A}}} \,\frac{d\mathbf{v}\mathbf{c}\,\mathbf{A}}{d\boldsymbol{\theta}^{\mathsf{T}}}.\tag{6.18}$$

The first term in (6.18) is the derivative of *ξ* with respect to the matrix **A**˜ . If, for example, *ξ* was the dominant eigenvalue *λ*, then this term would be the matrix calculus version of the well-known eigenvalue sensitivity equation.

The second term in (6.18) requires differentiating **A**˜ with respect to the parameters that determine it. From (6.16), write

$$
\bar{\mathbf{A}} = \mathbf{Q}\_{\mathrm{U}} \mathbb{U} + \mathbf{Q}\_{\mathrm{F}} \mathbb{F} \tag{6.19}
$$

where **<sup>Q</sup>**<sup>U</sup> <sup>=</sup> **<sup>K</sup>**TDU**<sup>K</sup>** and **<sup>Q</sup>**<sup>F</sup> <sup>=</sup> **<sup>K</sup>**TDF**<sup>K</sup>** are the (constant) matrix products appearing in the definition of **U**˜ and **F**˜ in (6.16).

Differentiating **A**˜ in (6.19) gives

$$d\operatorname{vec}\tilde{\mathbf{A}} = (\mathbf{I}\_{sw}\otimes\mathbf{Q}\_{\mathcal{U}}) \, d\operatorname{vec}\mathbb{U} + (\mathbf{I}\_{sw}\otimes\mathbf{Q}\_{\mathcal{F}}) \, d\operatorname{vec}\mathbb{F} \tag{6.20}$$

This requires the differentials of U and F. Differentiating U in (6.6) gives

$$d\mathbb{U} = \sum\_{l=1}^{\omega} \left( \mathbf{E}\_{li} \otimes d\mathbf{U}\_{l} \right) \tag{6.21}$$

Applying the vec operator to *d*U gives

$$d\operatorname{vec}\,\mathbf{U} = \sum\_{l=1}^{\omega} \left(\mathbf{E}\_{li} \otimes \mathbf{K} \otimes \mathbf{I}\_{s}\right) \left(\operatorname{vec}\,\mathbf{I}\_{\omega} \otimes \mathbf{I}\_{s^{2}}\right) d\operatorname{vec}\,\mathbf{U}\_{l} \tag{6.22}$$

using the results of Magnus and Neudecker (1985, Theorem 11); see also Klepac and Caswell (2011, Appendix B) on the derivative of the Kronecker product. Differentiation of F proceeds in the same fashion, yielding

$$d\text{vec}\,\mathbb{F} = \sum\_{l=1}^{\omega} \left(\mathbf{E}\_{li} \otimes \mathbf{K} \otimes \mathbf{I}\_{s}\right) \left(\text{vec}\,\mathbf{I}\_{\omega} \otimes \mathbf{I}\_{s^{2}}\right) d\text{vec}\,\mathbf{F}\_{l} \tag{6.23}$$

In the special case where U and F are constructed from single stage-classified matrices **U** and **F**, as in (6.10), Eqs. (6.22) and (6.23) simplify even further to

$$d\text{vec}\,\mathbb{U} = (\mathbf{I}\_{\omega} \otimes \mathbf{K} \otimes \mathbf{I}\_{\mathfrak{s}}) \left( \text{vec}\,\mathbf{I}\_{\omega} \otimes \mathbf{I}\_{\mathfrak{s}^2} \right) d\text{vec}\,\mathbb{U} \tag{6.24}$$

$$d\text{vec}\,\mathbb{F} = (\mathbf{I}\_{\boldsymbol{\alpha}} \otimes \mathbf{K} \otimes \mathbf{I}\_{\boldsymbol{s}}) \left( \text{vec}\,\mathbf{I}\_{\boldsymbol{\alpha}} \otimes \mathbf{I}\_{\boldsymbol{s}^{2}} \right) d\text{vec}\,\mathbb{F} \tag{6.25}$$

Substituting (6.22) and (6.23) into (6.20) and then substituting (6.20) into (6.18) yields the general result for the derivative

$$\begin{aligned} \frac{d\boldsymbol{\xi}}{d\boldsymbol{\theta}^{\mathsf{T}}} &= \frac{d\boldsymbol{\xi}}{d\boldsymbol{\text{vec}}^{\mathsf{T}}\bar{\mathbf{A}}} \Bigg[ \left( \mathbf{I}\_{sw} \otimes \mathbf{Q}\_{\mathsf{U}} \right) \sum\_{l=1}^{\omega} \left( \mathbf{E}\_{il} \otimes \mathbf{K} \otimes \mathbf{I}\_{s} \right) \left( \mathbf{vec} \, \mathbf{I}\_{\omega} \otimes \mathbf{I}\_{s^{2}} \right) \frac{d\mathbf{vec} \, \mathbf{U}\_{l}}{d\boldsymbol{\theta}^{\mathsf{T}}} \Bigg] \\ &+ \frac{d\boldsymbol{\xi}}{d\boldsymbol{\text{vec}}^{\mathsf{T}}\bar{\mathbf{A}}} \Bigg[ \left( \mathbf{I}\_{sw} \otimes \mathbf{Q}\_{\mathsf{F}} \right) \sum\_{l=1}^{\omega} \left( \mathbf{E}\_{il} \otimes \mathbf{K} \otimes \mathbf{I}\_{s} \right) \left( \mathbf{vec} \, \mathbf{I}\_{\omega} \otimes \mathbf{I}\_{s^{2}} \right) \frac{d\mathbf{vec} \, \mathbf{F}\_{l}}{d\boldsymbol{\theta}^{\mathsf{T}}} \Bigg] \end{aligned} \tag{6.26}$$

Notice that (6.26) requires only three pieces of demographic information: the derivatives of **U***<sup>i</sup>* and **F***<sup>i</sup>* with respect to the parameters (whatever those may be in the case at hand) and the sensitivity of the dependent variable *ξ* (whatever that may be) to the elements of the matrix **A**˜ from which it is calculated. All the other pieces of (6.26) are constants. Some of these constant matrices may be large, depending on *s* and *ω*, but they are very sparse; the sparse matrix technology available in MATLAB can be extremely useful in implementation. An alternative formulation of the differentials of the block matrices U and F is given in Caswell and van Daalen (2016).

#### **6.4 Examples**

Here we consider two examples of the sensitivity analysis of age-stage model to extract age-classified information from a stage-classified model. The first example will derive the sensitivity of the population growth rate *λ*, obtaining the sensitivity of *λ* to both age- and stage-specific survival, permitting examination of how selection pressures on senescence-inducing traits would vary from stage to stage. The second example is an analysis of the joint distribution of age and stage at death.

These examples are based on a stage-classified model (Parker 2000) for Scotch broom (*Cytisus scoparius*). Scotch broom is a large (up to 4 m tall) leguminous shrub, introduced into North America from Europe in the late nineteenth century. It is an invasive plant, considered a pest in the northwestern parts of North America. Stage-classified demographic models have been used to evaluate potential management policies for the plant (Parker 2000) and to investigate its potential for spatial spread (Neubert and Parker 2004).

The model contains seven stages (stage 1 = seeds, 2 = seedlings, 3 = juveniles, 4 = small adults, 5 = medium adults, 6 = large adults, 7 = extra-large adults), and parameters were estimated at a number of locations in Washington State. As is typical with many perennial plant species, survival is low for seeds and seedlings, but increases dramatically in larger stages. Parker's study presented estimated projection matrices for plants at the edge, at intermediate locations, and at the center of an invading stand. Plants near the center experience more crowding, with resulting reduced rates of survival, growth, and fertility.

#### *6.4.1 Population Growth Rate and Selection Gradients*

The population growth rate *λ*, the stable age or stage distribution **w**, and age or stage-specific reproductive value vector **v** are given by the dominant eigenvalue and corresponding right and left eigenvectors of the population projection matrix, respectively. In evolutionary demography, *λ* measures the fitness of a phenotype, in that it gives the eventual rate at which descendants of an individual with that phenotype will increase. The selection gradient on a vector of traits *θ* is given by

$$\frac{d\lambda}{d\theta^{\mathsf{T}}}\tag{6.27}$$

These gradients play a fundamental role in evolutionary biodemography, whether evolution is conceived of in terms of population genetics, quantitative genetics, adaptive dynamics, or mutation accumulation (e.g., Metz et al. 1992; Dercole and Rinaldi 2008; Rice 2004; Barfield et al. 2011). If the gradient is positive, selection favors an increase in the trait, and vice-versa.

In this application, *ξ* in (6.18) is the dominant eigenvalue *λ*. Let **w** and **v** be the right and left eigenvectors corresponding to *<sup>λ</sup>*, scaled so that **<sup>v</sup>**T**<sup>w</sup>** <sup>=</sup> 1. Then, in (6.26),

$$\frac{d\lambda}{d\text{vec}^{\mathsf{T}}\tilde{\mathbf{A}}} = \mathbf{w}^{\mathsf{T}} \otimes \mathbf{v}^{\mathsf{T}}.\tag{6.28}$$

See Chap. 3 and Caswell (2010).

In this model, the vital rates are functions only of stage; the phenotype is blind to the age of the individual. However, the terms in the summations in (6.26) give the selection gradients on traits that would modify the phenotype at each age. That is,

$$\begin{aligned} \left| \frac{d\lambda}{d\boldsymbol{\theta}^{\mathsf{T}}} \right|\_{\mathrm{age}=\boldsymbol{\ell}} &= \frac{d\lambda}{d\mathrm{vec}^{\mathsf{T}}\tilde{\mathbf{A}}} \Bigg[ \left( \mathbf{I}\_{sw} \otimes \mathbf{Q}\_{\mathsf{U}} \right) \left( \mathbf{E}\_{li} \otimes \mathbf{K} \otimes \mathbf{I}\_{s} \right) \left( \mathrm{vec} \, \mathbf{I}\_{o} \otimes \mathbf{I}\_{s^{2}} \right) \frac{d\mathrm{vec} \, \mathbf{U}\_{l}}{d\boldsymbol{\theta}^{\mathsf{T}}} \Bigg] \\ &+ \frac{d\lambda}{d\mathrm{vec}^{\mathsf{T}}\tilde{\mathbf{A}}} \Bigg[ \left( \mathbf{I}\_{sw} \otimes \mathbf{Q}\_{\mathsf{F}} \right) \left( \mathbf{E}\_{li} \otimes \mathbf{K} \otimes \mathbf{I}\_{s} \right) \left( \mathrm{vec} \, \mathbf{I}\_{\omega} \otimes \mathbf{I}\_{s^{2}} \right) \frac{d\mathrm{vec} \, \mathbf{F}\_{l}}{d\boldsymbol{\theta}^{\mathsf{T}}} \Bigg] \end{aligned} \tag{6.29}$$

Thus, these terms reveal the selection patterns that would operate on a mutation that was able to detect the age of an individual within a given stage, or that affected age differentially depending on the stage of the individual.

To examine the selection gradients on survival, it is necessary to separate survival from inter-stage transitions in **U**. Let *σ* be the vector of stage-specific survival probabilities. The matrix **<sup>U</sup>** can be written as the product of a matrix <sup>=</sup> **<sup>1</sup>***s<sup>σ</sup>* <sup>T</sup> containing the survival probabilities on the diagonal and a matrix **G** of transition probabilities, conditional on survival;

$$\mathbf{U} = \mathbf{G}\boldsymbol{\Sigma}.\tag{6.30}$$

(cf. Chap. 8). If **F** is independent<sup>1</sup> of *σ*, then

$$d\mathbf{U} = \mathbf{G} \, d\Sigma.\tag{6.31}$$

Applying the vec operator gives

$$d\text{vec}\,\mathbf{U} = (\mathbf{I}\_s \otimes \mathbf{G})\,\text{vec}\,\mathcal{D}\,(\mathbf{1}\_s d\sigma^\mathsf{T})$$

$$= (\mathbf{I}\_s \otimes \mathbf{G})\,\mathcal{D}\,(\text{vec}\,\mathbf{I}\_s)\,(\mathbf{I}\_s \otimes \mathbf{1}\_s)\,d\sigma\tag{6.32}$$

<sup>1</sup>By assuming that **F** does not depend on *σ*, I am in effect choosing a pre-breeding census and excluding neonatal mortality from *σ*.

which implies that

$$\frac{d\mathbf{vec}\,\mathbf{U}}{d\sigma^{\sf T}} = (\mathbf{I}\_s \otimes \mathbf{G})\,\mathcal{D}\ (\mathbf{vec}\,\mathbf{I}\_s)\,(\mathbf{I}\_s \otimes \mathbf{I}\_s) \tag{6.33}$$

Setting *θ* = *σ* and substituting (6.33) and (6.28) into (6.18) gives the selection gradient on *<sup>σ</sup>*. Substituting (6.33) and (6.28) into (6.29), with *<sup>d</sup>*vec **<sup>F</sup>***/dθ*<sup>T</sup> <sup>=</sup> **<sup>0</sup>**, gives the selection gradient on *σ* as a function of age and stage.

**Results** The projection matrix **A** for Scotch broom2 is

$$\mathbf{A} = \begin{pmatrix} 0.740 & 0 & 3.400 & 47.1 & 108.700 & 1120.0 & 3339.0 \\ 0.001 & 0.310 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0.350 \, 0.310 & 0 & 0 & 0 & 0 \\ 0 & 0.038 \, 0.290 \, 0.024 & 0 & 0 & 0 \\ 0 & 0 & 0.069 \, 0.390 & 0.320 & 0 & 0.091 \\ 0 & 0 & 0 & 0.440 & 0.440 & 0.530 & 0.091 \\ 0 & 0 & 0 & 0 & 0.029 & 0.400 & 0.730 \end{pmatrix} \tag{6.34}$$

The matrix **U** is obtained from **A** by setting all elements in the first row, except for *a*11, to zero. The matrix **F** is a 7 × 7 matrix with the elements of row 1, columns 2–7 of **A** in the corresponding positions, and zeros elsewhere. The maximum age was set to *ω* = 30. The aging matrices **D**<sup>U</sup> and **D**<sup>F</sup> are given by (6.2) and (6.3) with *ω* = 30. Because the vital rates do not depend on age, the dominant eigenvalues of **A** and **A**˜ should be identical, and they are; *λ* = 1*.*268.

The selection gradients on stage-specific survival (i.e., sensitivities of *λ* to *σ*) are shown in Fig. 6.1. There is a steady decline with increasing stage, from seeds to medium-sized adults, but then an increase for large and extra-large adults. A quite different pattern emerges when the selection gradients are calculated as functions of both age and stage, using (6.29). These results are shown in Fig. 6.2. The agespecific selection gradients on survival in stages 1–3 are strictly decreasing with age. But the age-specific selection gradients on survival in the adult stages 4–7 *increase* with age, level off, and then decline. The increase is longer and more pronounced in the larger adult stages.

It is now known that this pattern is widespread in plant populations. It appears in all eight of the Scotch broom populations studied by Parker (2000), and in almost all of 36 species of plants examined by Caswell and Salguero-Gómez (2013). It has important implications for the evolution of senescence. Hamilton (1966) showed that the selection gradient on age-specific mortality is always decreases with age, and argued that this implied that selection would always lead to senescence. Incorporating stage-dependence as well as age-dependence of the vital

<sup>2</sup>This is the matrix for the Discovery Park population, 1993–1994, edge conditions; taken from the Appendix of Parker (2000).

**Fig. 6.1** Sensitivity of population growth rate *λ* to stage-specific survival probabilities. Calculated for the stage-classified model of Scotch broom (*Cytisus scoparius*) using data from Parker (2000). Stages: 1 = seeds, 2 = seedlings, 3 = juveniles, 4 = small adults, 5 = medium adults, 6 = large adults, 7 = extra-large adults

**Fig. 6.2** Sensitivity of population growth rate *λ* to stage-specific survival as a function of age, for Scotch broom. Stages defined as in Fig. 6.1

rates means that, over some range of ages, the selection gradient increases (*contrasenescent* selection in the terminology of Caswell and Salguero-Gómez 2013). Thus conclusions that follow from the general decline in selection gradients with age may not apply to traits that affect age-specific survival differentially depending on developmental stage. Traits that affect survival in adult stages should postpone senescence for at least some time.

#### *6.4.2 Distributions of Age and Stage at Death*

The pattern of longevity within a population is captured by the probability distribution of the age at death, one of the standard results of age-classified life table analysis. The moments of the age at death and their sensitivity can also be calculated directly from stage-classified models using Markov chain methods (Feichtinger 1971b; Caswell 2001, 2006, 2009; Tuljapurkar and Horvitz 2006; Horvitz and Tuljapurkar 2008); see Chaps. 4 and 5. Here we can go beyond that and get the full joint distribution of stage and age at death, along with the marginal distributions of age at death and stage at death, implied by an age × stage classified model.

To do this, note that the cohort projection matrix **U**˜ describes movement of individuals among transient states of an absorbing Markov chain, where the absorbing state is death, or death classified by stage or age at death. The transition matrix of the chain is

$$
\tilde{\mathbf{P}} = \begin{pmatrix} \tilde{\mathbf{U}} \, \vert 0 \\ \tilde{\mathbf{M}} \, \vert \, \mathbf{I} \end{pmatrix} \tag{6.35}
$$

By properly structuring **M**, the model can give information about the age, stage, or the joint distribution of age and stage at death.<sup>3</sup> Each row of **M**˜ corresponds to an absorbing state, and *m*˜ *ij* is the probability of a transition from transient state *j* to absorbing state *i*. To compute the distribution of age and stage at death, we define the absorbing states to correspond to the age × stage combination at death. Thus **M**˜ contains probabilities of death on the diagonal and zeros elsewhere,

$$
\tilde{\mathbf{M}} = \mathbf{I}\_{s\omega} - \mathcal{D}\left(\mathbf{1}\_{s\omega}^{\mathsf{T}} \tilde{\mathbf{U}}\right). \tag{6.36}
$$

The fundamental matrix of the Markov chain in (6.35) is

$$
\tilde{\mathbf{N}} = \left(\mathbf{I} - \tilde{\mathbf{U}}\right)^{-1} \tag{6.37}
$$

The *(i, j )* element of **N**˜ is the expected number of visits that an individual in state *j* will make to transient state *i* before death.

Consider the eventual fate of an individual starting in transient state *j* . Let

$$
\tilde{b}\_{lj} = P\left[\text{eventual absorption in } i \mid \text{starting in } j\right] \tag{6.38}
$$

<sup>3</sup>This also leads to a powerful approach, including sensitivity analysis, for cause of death calculations (Caswell and Ouellette 2016, 2018).

The *b*˜ *ij* are the elements of the matrix **B**˜ (*sω* × *sω*) given by

$$
\bar{\mathbf{B}} = \bar{\mathbf{M}} \bar{\mathbf{N}} \tag{6.39}
$$

(Iosifescu 1980, Theorem 3.3; see also Caswell 2001, Section 5.1). Since the absorbing states (the rows of **M**˜ ) correspond to combinations of age and stage at death, column *j* of **B**˜ gives the joint distribution of age and stage at death, starting from state (i.e., age × stage combination) *j* :

$$\mathbf{B}(:,j) = \mathbf{B}\mathbf{e}\_{j} \tag{6.40}$$

using MATLAB notation in which **X***(*:*,j)* is column *j* of **X**, and where **e***<sup>j</sup>* is a vector of length *sω* with a 1 in the *j* th entry and zeros elsewhere. The rows of **B**˜ correspond to combinations of stage and age at death. Summing the rows over stages gives the marginal distribution of *age* at death, starting in column *j* of **B**˜ , as

$$\mathbf{g}\_{j} = \left(\mathbf{I}\_{\boldsymbol{\alpha}} \otimes \mathbf{1}\_{\boldsymbol{s}}^{\mathsf{T}}\right) \tilde{\mathbf{B}}(:,j) \qquad \text{ marginal age distribution} \qquad \boldsymbol{\omega} \times \mathbf{1} \tag{6.41}$$

Similarly, summing over ages gives the marginal distribution of *stage* at death:

$$\mathbf{h}\_{j} = \left(\mathbf{1}\_{\alpha}^{\mathsf{T}} \otimes \mathbf{I}\_{s}\right) \tilde{\mathbf{B}}(:,j) \qquad \text{ marginal stage distribution} \qquad s \times 1 \tag{6.42}$$

#### **6.4.2.1 Perturbation Analysis**

In the general sensitivity equation (6.18), the dependent variable *ξ* = **B**˜*(*:*,j)*. This depends only on **U**˜ , so the first term in (6.18) can be shown to be

$$\frac{d\xi}{d\text{vec}\,\tilde{\mathbf{A}}} = \frac{d\mathbf{B}(:,j)}{d\text{vec}\,\tilde{\mathbf{U}}}\tag{6.43}$$

$$= -\left(\mathbf{e}\_j^\mathsf{T}\tilde{\mathbf{N}}^\mathsf{T}\otimes\mathbf{I}\_{sa}\right)\mathcal{D}\left(\text{vec}\,\mathbf{I}\_{sa}\right)\left(\mathbf{I}\_{sa}\otimes\mathbf{1}\_{sa}\mathbf{1}\_{sa}^\mathsf{T}\right) + \left(\mathbf{e}\_j^\mathsf{T}\tilde{\mathbf{N}}^\mathsf{T}\otimes\mathbf{\tilde{B}}\right)\tag{6.44}$$

The desired derivative *<sup>d</sup>***B**˜*(*:*, j )/dθ*<sup>T</sup> is obtained by substituting (6.44) for *<sup>d</sup>ξ/d*vec **<sup>A</sup>**˜ in (6.26), setting *<sup>d</sup>*vec **<sup>F</sup>***i/dθ*<sup>T</sup> <sup>=</sup> 0.

The sensitivities of the marginal distributions of age and stage at death are then given by

$$\frac{d\mathbf{g}\_j}{d\boldsymbol{\theta}^\mathsf{T}} = \left(\mathbf{I}\_o \otimes \mathbf{1}\_s^\mathsf{T}\right) \frac{d\mathbf{B}(:,j)}{d\boldsymbol{\theta}^\mathsf{T}}\tag{6.45}$$

$$\frac{d\mathbf{h}\_j}{d\boldsymbol{\theta}^\mathsf{T}} = \left(\mathbf{1}\_{\boldsymbol{\omega}}^\mathsf{T} \otimes \mathbf{I}\_s\right) \frac{d\mathbf{B}(:,j)}{d\boldsymbol{\theta}^\mathsf{T}}\tag{6.46}$$

**Derivation** To derive the sensitivity of the joint distribution of age and stage at death, conditional on some starting age × stage combination, we start by differentiating equation (6.40) for column *j* of **B**˜ and applying the vec operator,

$$d\tilde{\mathbf{B}}(:,j) = \left(\mathbf{e}\_j^{\mathsf{T}} \otimes \mathbf{I}\_{s\alpha}\right) d\mathsf{vec}\,\tilde{\mathbf{B}}.\tag{6.47}$$

However, from (6.39), **B**˜ = **M**˜ **N**˜ , so

$$d\tilde{\mathbf{B}} = \left(d\tilde{\mathbf{M}}\right)\tilde{\mathbf{N}} + \tilde{\mathbf{M}}\left(d\tilde{\mathbf{N}}\right). \tag{6.48}$$

and

$$d\text{vec}\,\tilde{\mathbf{B}} = \left(\tilde{\mathbf{N}}^{\mathsf{T}} \otimes \mathbf{I}\_{s\omega}\right)d\text{vec}\,\tilde{\mathbf{M}} + \left(\mathbf{I}\_{s\omega} \otimes \tilde{\mathbf{M}}\right)d\text{vec}\,\tilde{\mathbf{N}}.\tag{6.49}$$

The differential of the fundamental matrix **N**˜ is

$$d\text{vec}\,\tilde{\mathbf{N}} = \left(\tilde{\mathbf{N}}^{\mathsf{T}} \otimes \tilde{\mathbf{N}}\right) d\text{vec}\,\tilde{\mathbf{U}}\tag{6.50}$$

(Caswell 2006; see Chap. 5). The differential of **M**˜ is obtained by rewriting (6.36) as

$$
\tilde{\mathbf{M}} = \mathbf{I}\_{s\alpha} - \mathbf{I}\_{s\alpha} \diamond \left(\mathbf{1}\_{s\alpha} \mathbf{1}\_{s\alpha}^{\mathsf{T}} \tilde{\mathbf{U}}\right),
\tag{6.51}
$$

differentiating,

$$d\tilde{\mathbf{M}} = -\mathbf{I}\_{s\omega} \diamond \left[ \mathbf{1}\_{s\omega} \mathbf{1}\_{s\omega}^{\mathsf{T}} \left( d\tilde{\mathbf{U}} \right) \right],\tag{6.52}$$

and applying the vec operator to obtain

$$d\text{vec}\,\tilde{\mathbf{M}} = -\mathcal{D}\,\left(\text{vec}\,\mathbf{I}\_{s\alpha}\right)\left(\mathbf{I}\_{s\alpha}\otimes\mathbf{1}\_{s\alpha}\mathbf{1}\_{s\alpha}^{\mathsf{T}}\right)d\text{vec}\,\tilde{\mathbf{U}}\tag{6.53}$$

Substituting (6.50) and (6.53) into (6.49) gives

$$d\text{vec}\,\tilde{\mathbf{B}} = \left[ -\left( \tilde{\mathbf{N}}^{\mathsf{T}} \otimes \mathbf{I}\_{s\omega} \right) \mathcal{D} \left( \text{vec}\,\mathbf{I}\_{s\omega} \right) \left( \mathbf{I}\_{s\omega} \otimes \mathbf{1}\_{s\omega} \mathbf{1}\_{s\omega}^{\mathsf{T}} \right) \right.$$

$$+ \left( \mathbf{I}\_{s\omega} \otimes \tilde{\mathbf{M}} \right) \left( \tilde{\mathbf{N}}^{\mathsf{T}} \otimes \tilde{\mathbf{N}} \right) \left] d\text{vec}\,\tilde{\mathbf{U}} \tag{6.54}$$

Substituting this into (6.47) gives

$$d\tilde{\mathbf{B}}(:,j) = \left[ -\left( \mathbf{e}\_j^{\mathsf{T}} \otimes \mathbf{I}\_{s\omega} \right) \left( \tilde{\mathbf{N}}^{\mathsf{T}} \otimes \mathbf{I}\_{s\omega} \right) \mathcal{D} \left( \text{vec} \, \mathbf{I}\_{s\omega} \right) \left( \mathbf{I}\_{s\omega} \otimes \mathbf{I}\_{s\omega} \mathbf{1}\_{s\omega}^{\mathsf{T}} \right) \right.$$

$$+ \left( \mathbf{e}\_j^{\mathsf{T}} \otimes \mathbf{I}\_{s\omega} \right) \left( \mathbf{I}\_{s\omega} \otimes \tilde{\mathbf{M}} \right) \left( \tilde{\mathbf{N}}^{\mathsf{T}} \otimes \tilde{\mathbf{N}} \right) \right] d\mathbf{vec} \,\tilde{\mathbf{U}} \tag{6.55}$$

Equation (6.55) can be simplified to obtain (6.44), using the fact that

$$(\mathbf{A} \otimes \mathbf{B})\left(\mathbf{C} \otimes \mathbf{D}\right) = \left(\mathbf{A}\mathbf{C} \otimes \mathbf{B}\mathbf{D}\right),$$

provided the products exist.

**Results** Figure 6.3 shows the joint distribution of age and stage at death for a seed of age 1 (one definition of "newborn" in this life cycle), with *ω* = 40. Almost all seeds will die as seeds, because the germination probability is low, *a*<sup>21</sup> = 0*.*001; see (6.34). The fates of seedlings (another possible choice for newborn status) are more diverse, and those of juveniles and small adults even moreso; the distributions show what proportion will die as seedlings, juveniles, etc., and at what ages (Fig. 6.3).

The marginal distribution of age at death, for individuals in each initial stage, is given in Fig. 6.4. Not surprisingly, larger stages have an age distribution of death shifted to later ages, including some probability of survival to age class *ω* (≥ 40 years in this calculation).

The sensitivity of **g**<sup>2</sup> (the marginal distribution of age at death for a seedling) is shown in Fig. 6.5. Changes in the survival of seeds (*σ*1) have no effect on this

**Fig. 6.3** The joint probability distribution of age (1*,...,* 10) and stage (1*,...,* 7) at death for an individual seed, seedling, juvenile, or small adult of Scotch broom. Stages as in Fig. 6.1

**Fig. 6.5** Sensitivity of the marginal distribution of age at death, **g**2, to the survival probabilities of each stage, for an individual starting in stage 2 (seedlings). Stages as in Fig. 6.1

distribution, because seedlings have already left the seed stage. Changes in *σ*2–*σ*<sup>7</sup> shift the distribution to progressively older ages, by reducing the probability of death at young ages and increasing it at older ages.

#### **6.5 Discussion**

Models in which individuals are classified by both age and stage extend demographic analyses in several directions. They permit biodemographic analyses of aging to take advantage of the many stage-classified demographic analyses accumulated by ecologists (Salguero-Gómez et al. 2015, 2016). They also permit human demographers to take account of factors other than age in determining mortality, longevity, fertility, and population dynamics.

Age- and stage-specific demographic processes are often combined in demography using multistate life table methods (e.g., Rogers 1975; Willekens 2002, 2014). These are usually focused on cohort dynamics and associated survival statistics (but see Rogers 1975, Chap. 5 for an explicit consideration of population projection). Multistate life table models are written as continuous-parameter, discrete-state Markov chains, where the parameter represents age and the states represent stages. In order to solve the resulting equations, the dynamics must be approximated over a (usually short) finite age interval; this would correspond to the sequence of matrices **A***<sup>i</sup>* in the model here. The age × stage-classified model described by **A**˜ is a way to solve the discretized equations in a single step, and makes possible a variety of analyses that are difficult or impossible in the usual life table formulation. Further investigation of the relation between continuous multistate life table methods and age × stage-classified models will be interesting.

These analyses blur the distinction (Chap. 5) between implicit and explicit age dependence. If the **A***<sup>i</sup>* are truly identical, by definition only implicit age dependence is revealed. But the structure of the age × stage model separates all of the agedependent **A***i*, and thus is ready to include any degree of explicit joint dependence of the vital rates on age and stage.

Given sufficient longitudinal data on both age and stage, it is possible to estimate the stage-specific matrices **A***<sup>i</sup>* as explicit functions of age; see Peeters et al. (2002) for an example of a study of human heart disease, and Lebreton et al. (2009) for a review of methods used in multistate capture-mark-recapture analysis in ecology. Needless to say, the data requirements for a full age × stage parameterization are challenging. I suspect that the development of estimation methods at intermediate levels of detail will be an important step.

#### *6.5.1 Reducibility and Ergodicity*

The properties of **A**˜ raise an important theoretical and technical issue regarding population growth, fitness, and selection gradients. The use of *λ* as a measure of fitness is usually justified by the strong ergodic theorem (Cohen 1979, Caswell 2001, Section 4.5.2), which guarantees the eventual convergence to the stable population structure and growth at a rate given by the dominant eigenvalue *λ*. A sufficient condition for this convergence is that the projection matrix be irreducible; i.e., that there exist a pathway connecting any two stages. Stott et al. (2010) surveyed published population projection matrices and found that reducible matrices were not uncommon, and explored the implications for ergodicity. Reducible matrices are not as bad as some people think, but it is important to understand their implications, especially for age × stage models.

General results about the irreducibility of block-structured matrices are difficult; see Csetenyi and Logofet (1989), Logofet (1993, Chap. 3), and Logofet and Belova (2007) for some important graph-theoretical results. However, the age × stage matrices developed here are unusual among population models in that they are (almost) always reducible, because they contain categories to which there are no possible pathways. This arises because age 1 individuals are produced only by reproduction. Hence there can never be age 1 individuals in any stage that is not produced by reproduction. For example, Scotch broom reproduces only by seeds, so age 1 seeds appear in the model. However, the matrix **A**˜ also contains entries corresponding to age 1 seedlings, age 1 juveniles, age 1 adults, etc. These do not exist, and because there are no pathways to these stages from any other stages, the matrix **A**˜ is reducible.

The Perron-Frobenius theorem guarantees that a reducible non-negative matrix will have a real, non-negative, dominant eigenvalue that is at least as large as any of the others. However, the asymptotic population growth rate and structure may depend on initial conditions (Caswell 2001, Section 4.5.4) This means that one must ascertain that the eigenvalues and eigenvectors under analysis correspond to initial conditions of interest.

Appendix A shows that a necessary and sufficient condition for population growth to be described by the dominant eigenvalue *λ* of **A**˜ , regardless of the (nonnegative and non-zero) initial population vector, is that the left eigenvector **v** be strictly positive, and that this corresponds to a particular block-triangular form of **A**˜ . This provides a simple check for the ergodicity of population growth, and justifies the use of *λ* as a population growth rate and measure of fitness.

Primitivity may be difficult to evaluate for an age × stage matrix (but see Logofet 1993) but as with any projection matrix model, the long-term average growth rate of a primitive matrix is still given by the dominant real eigenvalue.

The matrix **A**˜ for Scotch broom in (6.34) is reducible, as shown by calculating **I***sω* + **A**˜ *sω* and finding that this matrix contains zeros (Caswell 2001). However, the left eigenvector **v** is strictly positive, so we know that the population eventually grows at the rate *λ* regardless of initial conditions.

## *6.5.2 A Protocol for Age* **×** *Stage-Classified Models*

The approach outlined here gives a step-by-step procedure for constructing and analyzing age × stage-classified matrix population models.

	- (a) choose a dependent variable *ξ* and a vector of parameters *θ*,
	- (b) compute the sensitivity matrix *dξ/d*vec <sup>T</sup>**A**˜ ,
	- (c) compute the matrices:

$$\frac{d\mathbf{vec}\,\mathbf{A}\_l}{d\boldsymbol{\theta}^\mathsf{T}}, \ \frac{d\mathbf{vec}\,\mathbf{U}\_l}{d\boldsymbol{\theta}^\mathsf{T}}, \ \text{and} \ \frac{d\mathbf{vec}\,\mathbf{F}\_l}{d\boldsymbol{\theta}^\mathsf{T}}$$

(d) compute *dξ/dθ*<sup>T</sup> according to (6.18).

The explicit connection between matrix population models and absorbing Markov chain theory makes it possible to analyze both population dynamics and cohort dynamics in a unified framework (cf. Feichtinger 1971a; Caswell 2001, 2006, 2009). Cohort dynamics are, in essence, the demography of individuals. It may seem paradoxical to speak of the demography of individuals, but that is what it is, because the *statistical* properties of a cohort (e.g., average lifespan) are *probabilistic* properties of an individual (e.g., life expectancy). Demography in general, and matrix population models in particular, provides the link between the individual and the population.

#### **A Appendix: Population Growth and Reducible Matrices**

Some ergodic properties of population growth under the action of reducible matrices are described by Caswell (2001, Section4.5.4). Here we can extend the analysis.

Let **A** be a reducible non-negative projection matrix. By permutation of its rows and columns (i.e., renumbering the stages in the life cycle), **A** can be transformed to a block lower-triangular form. Here is an example:

$$\mathbf{A} = \begin{pmatrix} \mathbf{B}\_{11} & 0 & 0 & 0 \\ \mathbf{B}\_{21} & \mathbf{B}\_{22} & 0 & 0 \\ \mathbf{B}\_{31} & \mathbf{B}\_{32} & \mathbf{B}\_{33} & 0 \\ \mathbf{B}\_{41} & \mathbf{B}\_{42} & \mathbf{B}\_{43} & \mathbf{B}\_{44} \end{pmatrix}. \tag{6.56}$$

In this form, all the diagonal blocks **B***ii* are either irreducible matrices or 1 × 1 (i.e. scalar) zero matrices. The block triangular form is unique, up to a renumbering of the blocks and permutation of indices within blocks (Gantmacher 1959). It corresponds to a decomposition of the state space into a set of subspaces; let *Ri* be the subspace corresponding to the block **B***ii*.

Some or all of the subdiagonal blocks in (6.56) may be zero. For reasons that will become apparent, consider an example where **B**<sup>21</sup> = **B**<sup>43</sup> = **0**; i.e.,

$$\mathbf{A} = \begin{pmatrix} \mathbf{B}\_{11} & 0 & 0 & 0 \\ 0 & \mathbf{B}\_{22} & 0 & 0 \\ \mathbf{B}\_{31} & \mathbf{B}\_{32} & \mathbf{B}\_{33} & 0 \\ \mathbf{B}\_{41} & \mathbf{B}\_{42} & 0 & \mathbf{B}\_{44} \end{pmatrix} \tag{6.57}$$

Gantmacher (1959, Section 13.4) calls a block **B***ii isolated* if there are no other non-zero blocks on its row, that is, if **B***ij* = 0 for *j<i*. I will call such a block *row-isolated*, and introduce the term *column-isolated* to describe any block **B***ii* with no other non-zero blocks in its column, that is, **B***j i* = 0 for *j>i*. In the matrix in (6.57), the blocks **B**<sup>11</sup> and **B**<sup>22</sup> are row-isolated and the blocks **B**<sup>33</sup> and **B**<sup>44</sup> are column-isolated.

If **B***ii* is row-isolated, then the life cycle graph contains no pathways from any state outside of the subspace *Ri* to any state inside *Ri*, and *Ri* is a source. If **B***ii* is column-isolated, then the life cycle graph contains no pathways from any state in *Ri* to any state outside *Ri*, and *Ri* is a sink.

The eigenvalues of **A** are the eigenvalues of the diagonal blocks **B***ii*. Let *λ*<sup>1</sup> be the dominant eigenvalue of **A**, with right and left eigenvectors **w**<sup>1</sup> and **v**1. The Perron-Frobenius theorem guarantees that *λ*1, **w**1, and **v**<sup>1</sup> are real and non-negative. Gantmacher (1959, Chap. 13, Theorem 6) proves that the eigenvector **w**<sup>1</sup> is strictly positive if and only if *λ*<sup>1</sup> is an eigenvalue of every row-isolated block, and is not an eigenvalue of any of the non-row-isolated blocks. This makes it easy to demonstrate the following corollary.

**Corollary: Positivity of v1** Let **v**<sup>1</sup> be the left eigenvector corresponding to *λ*1[**A**]. Then **v**<sup>1</sup> is strictly positive if and only if *λ*1[**A**] is an eigenvalue of every columnisolated block, and is not an eigenvalue of any non-column-isolated block.

To see this, note that **v**<sup>1</sup> is the right eigenvector of **A**T. The column-isolated blocks of **A** become row-isolated blocks of the block lower-triangular form of **A**T, and application of Gantmacher's Theorem 6 proves the Corollary.

For example, transposing (6.57) gives

$$\mathbf{A}^{\mathsf{T}} = \begin{pmatrix} \mathbf{B}\_{11}^{\mathsf{T}} & 0 & \mathbf{B}\_{31}^{\mathsf{T}} \mathbf{B}\_{41}^{\mathsf{T}} \\ 0 & \mathbf{B}\_{22}^{\mathsf{T}} \mathbf{B}\_{32}^{\mathsf{T}} & \mathbf{B}\_{42}^{\mathsf{T}} \\ 0 & 0 & \mathbf{B}\_{33}^{\mathsf{T}} & 0 \\ 0 & 0 & 0 & \mathbf{B}\_{44}^{\mathsf{T}} \end{pmatrix} \tag{6.58}$$

Reversing the order of the rows and columns gives the block lower-triangular form

$$
\begin{pmatrix}
\mathbf{B}\_{44}^{\mathsf{T}} & 0 & 0 & 0 \\
0 & \mathbf{B}\_{33}^{\mathsf{T}} & 0 & 0 \\
\mathbf{B}\_{42}^{\mathsf{T}} & \mathbf{B}\_{32}^{\mathsf{T}} & \mathbf{B}\_{22}^{\mathsf{T}} & 0 \\
\mathbf{B}\_{41}^{\mathsf{T}} & \mathbf{B}\_{31}^{\mathsf{T}} & 0 & \mathbf{B}\_{11}^{\mathsf{T}}
\end{pmatrix}
\tag{6.59}
$$

The column-isolated blocks in **A** (**B**<sup>33</sup> and **B**44) now appear as row-isolated blocks in **A**T. Gantmacher's result shows that the eigenvector **v**<sup>1</sup> will be positive if and only if *λ*<sup>1</sup> is an eigenvalue of each of those blocks.

The usefulness of the Corollary follows from the population projection model

$$\mathbf{n}(t+1) = \mathbf{A}\mathbf{n}(t) \qquad \mathbf{n}(0) = \mathbf{n}\_0 \tag{6.60}$$

and its solution4

$$\mathbf{n}(t) = \sum\_{l=1}^{s} c\_l \lambda\_l^l \mathbf{w}\_l \tag{6.61}$$

$$\mathbf{h} = \sum\_{l=1}^{s} \left( \mathbf{v}\_l^\mathsf{T} \mathbf{n}\_0 \right) \lambda\_l^l \mathbf{w}\_l \tag{6.62}$$

Caswell (2001). If **<sup>n</sup>**<sup>0</sup> is such that *<sup>c</sup>*<sup>1</sup> <sup>=</sup> **<sup>v</sup>**<sup>T</sup> <sup>1</sup>**n**<sup>0</sup> is positive, then *<sup>λ</sup><sup>t</sup>* <sup>1</sup> will eventually dominate all other terms in the solution and the population will grow at the rate *λ*<sup>1</sup> with stable structure **w**1. We know the following about *c*1:


In the first two cases, population growth is ergodic from any non-zero initial population. In the third case, there exists a basin of attraction leading to growth according to *λ*1, and a basin (or basins) of attraction for growth according to the dominant eigenvalues of the diagonal blocks **B***ii* corresponding to the zero entries of **v**1.

<sup>4</sup>This holds provided that **A** is diagonalizable, which is a generic property for linear operators (Hirsch and Smale 1974, p. 157).

#### **Bibliography**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Part III Time-Varying and Stochastic Models**

## **Chapter 7 Transient Population Dynamics**

#### **7.1 Introduction**

Short-term, transient population dynamics can differ in important ways from longterm asymptotic dynamics. Just as perturbation analysis (sensitivity and elasticity) of the asymptotic growth rate reveals the effects of the vital rates on longterm growth (Chap. 3), the perturbation analysis of transient dynamics can reveal the determinants of short-term patterns. This chapter presents a comprehensive approach to transient sensitivity analysis that applies to linear time-invariant, timevarying, subsidized, stochastic, nonlinear, and spatial models.

In a constant environment, once a population converges to its stable stage structure, it grows exponentially at a constant rate *λ*. However, depending on initial conditions, short-term transient dynamics can differ from the asymptotic dynamics. It has long been recognized that a focus on *λ* alone can obscure these important transient effects (e.g., Lotka 1939; Coale 1972). There have been attempts to develop transient sensitivity analyses using all the eigenvalues of the projection matrix (Fox and Gurevitch 2000), but these are complicated to calculate and limited in application. Matrix calculus allows us to do better (Caswell 2007).

#### **7.2 Time-Invariant Models**

Armed with matrix calculus, consider the linear time-invariant model,

$$\mathbf{n}(t+1) = \mathbf{A}\mathbf{n}(t) \qquad \mathbf{n}(0) = \mathbf{n}\_0,\tag{7.1}$$

Chapter 7 is modified, by permission of John Wiley and Sons, from Caswell, H. 2007. Sensitivity analysis of transient population dynamics. Ecology Letters 10:1–15.

H. Caswell, *Sensitivity Analysis: Matrix Methods in Demography and Ecology*, Demographic Research Monographs, https://doi.org/10.1007/978-3-030-10534-1\_7

where **n** is *s* × 1 and **A** is *s* × *s*; *s* the number of stages. Assume that **A** = **A**[*θ*] depends on a *p* × 1 vector of parameters *θ*, which could be entries of **A**, lower-level parameters, or elements of the initial vector.

The sequence of matrices

$$\frac{d\mathbf{n}(t)}{d\theta^{\mathsf{T}}} \qquad t = 1, 2, \ldots \tag{7.2}$$

gives the effect of all the parameters on all the entries of **n***(t)*. From it we can calculate the sensitivities and elasticities of other dependent variables (Sect. 7.3).

We differentiate the model (7.1), obtaining

$$d\mathbf{n}(t+1) = \mathbf{A} \, d\mathbf{n}(t) + (d\mathbf{A})\, \mathbf{n}(t),\tag{7.3}$$

and then apply the vec operator to both sides, remembering that since **n** is a vector, vec **n** = **n**,

$$d\mathbf{n}(t+1) = \mathbf{A}d\mathbf{n}(t) + \left(\mathbf{n}^{\dagger}(t) \otimes \mathbf{I}\_{s}\right)d\mathbf{vec} \,\mathbf{A}.\tag{7.4}$$

Then the first identification theorem and the chain rule, from Eqs. (2.47) and (2.18), give the sensitivity of **n***(t* + 1*)* to the elements of **A**,

$$\frac{d\mathbf{n}(t+1)}{d\mathbf{v}\mathbf{c}\,^\mathsf{T}\mathbf{A}} = \mathbf{A}\frac{d\mathbf{n}(t)}{d\mathbf{v}\mathbf{c}\,^\mathsf{T}\mathbf{A}} + \left(\mathbf{n}^\mathsf{T}(t)\otimes\mathbf{I}\_s\right). \tag{7.5}$$

The chain rule extends (7.5) to give the sensitivity to lower-level parameters,

$$\frac{d\mathbf{n}(t+1)}{d\boldsymbol{\theta}^{\mathsf{T}}} = \frac{d\mathbf{n}(t+1)}{d\mathbf{v}\mathbf{c}^{\mathsf{T}}\mathbf{A}} \frac{d\mathbf{v}\mathbf{c}\,\mathbf{A}}{d\boldsymbol{\theta}^{\mathsf{T}}}$$

$$=\mathbf{A}\frac{d\mathbf{n}(t)}{d\boldsymbol{\theta}^{\mathsf{T}}} + \left(\mathbf{n}^{\mathsf{T}}(t)\otimes\mathbf{I}\_{\mathsf{s}}\right)\frac{d\mathbf{v}\mathbf{c}\,\mathbf{A}}{d\boldsymbol{\theta}^{\mathsf{T}}}.\tag{7.6}$$

Equations (7.5) and (7.6) are matrix difference equations in the sensitivities of **n***(t)* to the elements of vec **A** or of *θ*. If we know *d***n***(t)/dθ* <sup>T</sup> and **n***(t)*, we can calculate *d***n***(t* + 1*)/dθ* <sup>T</sup> and **n***(t* + 1*)* and continue this iteration to obtain the transient sensitivities at any time. If the parameters in *θ* affect the vital rates but not the initial population, the appropriate initial condition for this iteration is

$$\frac{d\mathbf{n}(0)}{d\boldsymbol{\theta}^{\mathsf{T}}} = \mathbf{0}\_{\boldsymbol{s}\times p}.\tag{7.7}$$

If *θ* affects only the initial population, then

$$\frac{d\mathbf{n}(0)}{d\boldsymbol{\theta}^{\mathsf{T}}} = \mathbf{I}\_s \tag{7.8}$$

gives the sensitivity of transient dynamics to a change in initial conditions.

#### **7.3 Sensitivity of What? Choosing Dependent Variables**

The sensitivity of other dependent variables may be more interesting than that of **n***(t)*. In an early (and relatively crude) transient analysis, Caswell and Werner (1978) analyzed the transient dynamics of the plant teasel (*Dipsacus sylvestris*) in terms of rosette area at time *t* (which might affect resistance to invasion by later successional species) and cumulative seed production up to time *t* (which might affect colonization of new sites). For a weedy species like teasel, either of these dependent variables might be more relevant than the asymptotic growth rate.

Here are some other biologically interesting dependent variables. They are easy to calculate from *d***n***(t)/dθ* <sup>T</sup> .

1. Population density, as measured by a weighted sum of stage densities. Let **c** ≥ 0 be a weight vector. Then population density is *N (t)* = **c**<sup>T</sup> **n***(t)*. This includes total density (**c** = **1***s*, a vector of ones), the density of a subset of stages (*ci* = 1 for stages to be counted; *ci* = 0 otherwise), biomass (*ci* is the biomass of stage *i*), basal area, metabolic rate, etc. The sensitivity of *N (t)* is

$$\frac{dN(t)}{d\boldsymbol{\theta}^{\boldsymbol{\tau}}} = \mathbf{c}^{\boldsymbol{\tau}} \frac{d\mathbf{n}(t)}{d\boldsymbol{\theta}^{\boldsymbol{\tau}}}.\tag{7.9}$$

2. Ratios measuring the relative abundances of different stages:

$$R(t) = \frac{\mathbf{a}^{\mathsf{T}}\mathbf{n}(t)}{\mathbf{b}^{\mathsf{T}}\mathbf{n}(t)}.\tag{7.10}$$

where **a** and **b** are weight vectors. Examples include the dependency ratio (in human demography, the ratio of the individuals below 15 or above 65 to those between 15 and 65), the sex ratio in a two-sex model, and the ratio of juveniles to adults, which is important in wildlife management (Williams et al. 2002; Skalski et al. 2005). The sensitivity of *R(t)* is

$$\frac{d\boldsymbol{R}(t)}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left(\frac{\mathbf{b}^{\mathsf{T}}\mathbf{n}(t)\mathbf{a}^{\mathsf{T}} - \mathbf{a}^{\mathsf{T}}\mathbf{n}(t)\mathbf{b}^{\mathsf{T}}}{\left(\mathbf{b}^{\mathsf{T}}\mathbf{n}(t)\right)^{2}}\right)\frac{d\mathbf{n}(t)}{d\boldsymbol{\theta}^{\mathsf{T}}}.\tag{7.11}$$

3. Cumulative density up to a specified time,

$$C(t) = \sum\_{i=0}^{t} \mathbf{c}^{\mathsf{T}} \mathbf{n}(i),\tag{7.12}$$

the sensitivity of which is

$$\frac{dC(t)}{d\theta^{\sf T}} = \mathbf{c}^{\sf T} \sum\_{i=0}^{t} \frac{d\mathbf{n}(i)}{d\theta^{\sf T}}.\tag{7.13}$$

#### 144 7 Transient Population Dynamics

4. Average density over an interval,

$$\bar{N}(t\_1, t\_2) = \frac{1}{t\_2 - t\_1} \sum\_{i=t\_1}^{t\_2} N(i),\tag{7.14}$$

the sensitivity of which is

$$\frac{d\bar{N}(t\_1, t\_2)}{d\theta^\dagger} = \frac{1}{t\_2 - t\_1} \sum\_{i=t\_1}^{t\_2} \mathbf{c}^\dagger \frac{d\mathbf{n}(i)}{d\theta^\dagger}.\tag{7.15}$$

5. Maximum (or minimum) density over an interval,

$$M(t\_1, t\_2) = \max\_{t\_1 \le i \le t\_2} N(i). \tag{7.16}$$

Let *t* ˜ be the time such that *M(t*1*, t*2*)* = *N (t)*˜ . Then, except in the unlikely event of ties,

$$\frac{dM(t\_1, t\_2)}{d\theta^\dagger} = \mathbf{c}^\dagger \frac{d\mathbf{n}(\tilde{t})}{d\theta^\dagger} \tag{7.17}$$

with a similar expression for the minimum.

6. Variance in density over an interval *t*<sup>1</sup> ≤ *t* ≤ *t*2,

$$V(t\_1, t\_2) = \frac{1}{t\_2 - t\_1} \sum\_{i=t\_1}^{t\_2} N^2(i) - \left[\bar{N}(t\_1, t\_2)\right]^2. \tag{7.18}$$

The sensitivity of *V* is

$$\frac{dV(t\_1, t\_2)}{d\theta^\mathsf{T}} = \frac{2}{t\_2 - t\_1} \left[ \sum\_{i=t\_1}^{t\_2} N(i) \frac{dN(i)}{d\theta^\mathsf{T}} - \bar{N}(t\_1, t\_2) \sum\_{i=t\_1}^{t\_2} \frac{dN(i)}{d\theta^\mathsf{T}} \right] \tag{7.19}$$

$$=\frac{2}{t\_2 - t\_1} \left[ \sum\_{i=t\_1}^{t\_2} \left( N(i) - \bar{N}(t\_1, t\_2) \right) \frac{dN(i)}{d\theta^\dagger} \right]. \tag{7.20}$$

7. The transient population growth rate at time *t*,

$$r(t) = \log \frac{N(t+1)}{N(t)}.\tag{7.21}$$

#### 7.4 Elasticity Analysis 145

The sensitivity of *r* is

$$\frac{dr(t)}{d\theta^{\sf r}} = \frac{\mathbf{c}^{\sf r}}{N(t+1)} \frac{d\mathbf{n}(t+1)}{d\theta^{\sf r}} - \frac{\mathbf{c}^{\sf r}}{N(t)} \frac{d\mathbf{n}(t)}{d\theta^{\sf r}}.\tag{7.22}$$

8. Average growth rate over an interval *t*<sup>1</sup> ≤ *t* ≤ *t*2,

$$\bar{r}(t\_1, t\_2) = \frac{1}{t\_2 - t\_1} \log \frac{N(t\_2)}{N(t\_1)},\tag{7.23}$$

the sensitivity of which is

$$\frac{d\bar{r}(t\_1, t\_2)}{d\theta^\mathsf{T}} = \frac{1}{t\_2 - t\_1} \left( \frac{\mathbf{c}^\mathsf{T}}{N(t\_2)} \frac{d\mathbf{n}(t\_2)}{d\theta^\mathsf{T}} - \frac{\mathbf{c}^\mathsf{T}}{N(t\_1)} \frac{d\mathbf{n}(t\_1)}{d\theta^\mathsf{T}} \right). \tag{7.24}$$

#### **7.4 Elasticity Analysis**

Transient elasticities are easily calculated from the sensitivities. The elasticity of *ni(t)* to *θj* is

$$\frac{\epsilon n\_{l}}{\epsilon \theta\_{j}} = \frac{\theta\_{j}}{n\_{l}(t)} \frac{dn\_{l}(t)}{d\theta\_{j}}.\tag{7.25}$$

Creating a matrix of these elasticities requires multiplying column *j* of *d***n***/dθ* <sup>T</sup> by *θj* and dividing row *i* by *ni*. This is just

$$\mathcal{D}\left[\mathbf{n}(t)\right]^{-1} \stackrel{d\mathbf{n}(t)}{=} \mathcal{D}\left[\boldsymbol{\theta}\right],\tag{7.26}$$

where D [**x**] is a matrix with **x** on the diagonal and zeros elsewhere. The elasticity of any other (scalar- or vector-valued) dependent variable *f (***n***(t))* is given by

$$\mathcal{D}\left[f(\mathbf{n}(t))\right]^{-1} \stackrel{df}{=} \frac{df(\mathbf{n}(t))}{d\boldsymbol{\theta}^{\mathsf{T}}} \mathcal{D}\left[\boldsymbol{\theta}\right].\tag{7.27}$$

**Example: A transient outbreak: elasticity to lower-level parameters** Consider a hypothetical size-classified population with

$$\mathbf{A} = \begin{pmatrix} 0.3763 & 0 & 0.8431 \ 8.4312 \\ 0.1939 \ 0.5421 & 0 & 0 \\ 0 & 0.1177 \ 0.5240 & 0 \\ 0 & 0 & 0.1291 \ 0.5254 \end{pmatrix} \text{.} \tag{7.28}$$

The asymptotic growth rate calculated as the dominant eigenvalue of **A** is *λ* = 0*.*92, so the population is headed for eventual decline. However, the initial condition

$$\mathbf{n}\_0 = \begin{pmatrix} 0 \ 0 \ 0 \ 1 \end{pmatrix}' \tag{7.29}$$

(introduction of a large adult) produces a dramatic transient outbreak (Fig. 7.1), during which total population increases by over 900% and remains above its initial value for about 25 years.1

If this was a pest its asymptotic fate (extinction) would be reassuring, but *λ* would reveal nothing about the transient outbreak. A manager might want to know how changes in the lower-level survival probabilities *σi*, growth probabilities *γi*, and fertilities *fi* would affect the outbreak, where the elements of **A** are

$$\begin{aligned} a\_{li} &= \sigma\_l (1 - \chi\_l) & i &= 1, \dots, 4\\ a\_{l+1,i} &= \sigma\_l \chi\_l & i &= 1, \dots, 3\\ a\_{1l} &= f\_l & i &= 3, 4. \end{aligned} \tag{7.30}$$

If the impact of the pest was related to size, the manager might measure population density with weights, say **c**<sup>T</sup> = 1234 . Two measures of damage might be the maximum of the outbreak and the cumulative population size over the entire outbreak. Finally, to put everything on a proportional basis, the manager might want to use elasticities.

<sup>1</sup>The curious reader may wish to know that **A** was obtained by a random search for size-classified matrices with high reactivity (Neubert and Caswell 1997; Caswell and Neubert 2005; Verdy and Caswell 2008).

Define *θ* as the 9 × 1 vector whose entries are *σ*1–*σ*4, *γ*1–*γ*3, and *f*3–*f*4. The derivatives *d*vec **A***/dθ* <sup>T</sup> are obtained from (7.30). The sensitivity of **n***(t)* to changes in *θ* is given by (7.6). Using (7.9) and (7.27) we obtain the elasticity of *N (t)* to *θ* as

$$\frac{\epsilon \, N(t)}{\epsilon \, \theta^{\mathsf{T}}} = \frac{1}{N(t)} \, \mathbf{c}^{\mathsf{T}} \frac{d\mathbf{n}(t)}{d\theta^{\mathsf{T}}} \, \mathcal{D}\,(\theta). \tag{7.31}$$

The peak of the outbreak occurs at *t* = 2; thus (7.17) gives the elasticity of the peak density to *θ* as

$$\frac{\epsilon N(2)}{\epsilon \theta^{\mathsf{T}}} = \frac{1}{N(2)} \ c^{\mathsf{T}} \frac{d\mathbf{n}(2)}{d\theta^{\mathsf{T}}} \ \_{\mathcal{D}}\mathcal{D}\,(\mathsf{\theta}).\tag{7.32}$$

The cumulative density up to time *t* is given by (7.12) and the sensitivity by (7.13), so the elasticity is

$$\frac{1}{\mathbf{c}^{\mathsf{T}}\sum\_{0}^{\mathsf{t}}\mathbf{n}(t)}\ \mathbf{c}^{\mathsf{T}}\sum\_{i=0}^{\mathsf{t}}\frac{d\mathbf{n}(i)}{d\boldsymbol{\theta}^{\mathsf{t}}}\ \mathcal{D}\left(\boldsymbol{\theta}\right).\tag{7.33}$$

Results are shown in Fig. 7.2. The elasticities of the maximum outbreak density are very different from those of *λ*. The elasticity of the cumulative density over the first 5 years has a similar pattern, also very different from that of *λ*. However, by the end of the outbreak (25 years) the elasticity of cumulative density is quite similar to that

**Fig. 7.2** The elasticities of the maximum population density, of the cumulative densities up to *t* = 5 and *t* = 25, and of *λ* to the lower-level demographic parameters, for the outbreak shown in Fig. 7.1

of *λ*, so management over this time scale could reasonably rely on the elasticity of *λ* to compare control tactics. Intermediate steps and MATLAB code are found in an appendix to Caswell (2007). -

#### **7.5 Sensitivity of Time-Varying Models**

Now consider the time-varying model

$$\mathbf{n}(t+1) = \mathbf{A}\_l \mathbf{n}(t) \qquad \mathbf{n}(0) = \mathbf{n}\_0,\tag{7.34}$$

where **A***<sup>t</sup>* , *t* = 1*,...,T* is a specified sequence of matrices.

Take the differential of both sides of (7.34)

$$d\mathbf{n}(t+1) = \mathbf{A}\_{l}d\mathbf{n}(t) + (d\mathbf{A}\_{l})\mathbf{n}(t),\tag{7.35}$$

and apply the vec operator to obtain

$$d\mathbf{n}(t+1) = \mathbf{A}\_l d\mathbf{n}(t) + \left(\mathbf{n}^\top(t) \otimes \mathbf{I}\_s\right) (d\text{vec}\,\mathbf{A}\_l) \,. \tag{7.36}$$

Not only the transient behavior of the population, but also the parameter vector *θ*, the matrix **A***<sup>t</sup>* , and the perturbation applied to *θ* may change over time. The sensitivity analysis must reflect both types of variation. So, let us treat **A***<sup>t</sup>* as a function of *θ(t)*, and consider a perturbation of *θ* at some time *u*. Applying the chain rule to (7.36), we obtain

$$\frac{d\mathbf{n}(t+1)}{d\boldsymbol{\theta}^{\sf T}(\boldsymbol{\mu})} = \mathbf{A}\_{l} \frac{d\mathbf{n}(t)}{d\boldsymbol{\theta}^{\sf T}(\boldsymbol{\mu})} + \left(\mathbf{n}^{\sf T}(t) \otimes \mathbf{I}\_{s}\right) \frac{d\mathbf{v} \mathbf{c} \,\mathbf{A}\_{l}}{d\boldsymbol{\theta}^{\sf T}(\boldsymbol{\mu})} \tag{7.37}$$

which has the same form as (7.6) except that the matrix and the matrix derivative vary over time.

Some useful simplifications follow from this formulation.

1. Perturbation of matrix elements. If *θ(t)* consists of the elements of vec **A***<sup>t</sup>* , then

$$\frac{d\vec{\text{vec}}\,\mathbf{A}\_{\text{f}}}{d\theta^{\uparrow}(t)} = \mathbf{I}\_{s^{2}}\tag{7.38}$$

and can be eliminated from the expressions where it appears.

2. No time travel. Suppose that *θ(t)* is perturbed at some time *t* = *u*. Then

$$\frac{d\mathbf{vec}\,\mathbf{A}\_l}{d\boldsymbol{\theta}^\sf T(\boldsymbol{\mu})} = \mathbf{0}\_{\boldsymbol{s}^2 \times \boldsymbol{p}} \qquad \text{for } t < \boldsymbol{u} \tag{7.39}$$

However, the effects of the perturbation continue after *t* = *u*, so that *d***n***(t)/dθ* <sup>T</sup> *(u)* will generally be non-zero for *t>u*.

3. Perturbations at every time. A permanent modification of the parameters can be considered a perturbation of *θ(t)* for every time *t* = 0*,* 1*,...*, so that

$$
\theta(t) \longrightarrow \theta(t) + d\theta. \tag{7.40}
$$

The sensitivity of the population vector is then

$$\frac{d\mathbf{n}(t+1)}{d\theta^{\mathsf{T}}} = \mathbf{A}\_{l}\frac{d\mathbf{n}(t)}{d\theta^{\mathsf{T}}} + \left(\mathbf{n}^{\mathsf{T}}(t) \otimes \mathbf{I}\_{s}\right)\frac{d\mathbf{vec}\,\mathbf{A}\_{l}}{d\theta^{\mathsf{T}}}\tag{7.41}$$

4. Perturbation over a range of times. One might be interested in perturbation over some time period *T*<sup>1</sup> ≤ *t* ≤ *T*2. The effect of such a perturbation on transient dynamics is

$$\frac{d\mathbf{n}(t+1)}{d\boldsymbol{\theta}^{\sf T}(\boldsymbol{u})} = \mathbf{A}\_{l} \frac{d\mathbf{n}(t)}{d\boldsymbol{\theta}^{\sf T}(\boldsymbol{u})} + \left(\mathbf{n}^{\sf T}(t) \otimes \mathbf{I}\_{s}\right) J(t) \frac{d\mathbf{v} \,\mathbf{c} \,\mathbf{A}\_{l}}{d\boldsymbol{\theta}^{\sf T}(\boldsymbol{u})} \tag{7.42}$$

where *J (t)* is an indicator variable

$$J(t) = \begin{cases} 1 \ T\_1 \le t \le T\_2\\ 0 \text{ otherwise} \end{cases} \tag{7.43}$$

These calculations have been extended to apply to population projections (Caswell and Sanchez Gassen 2015; Sanchez Gassen and Caswell 2018); see Sect. 7.8 below.

#### **7.6 Sensitivity of Subsidized Populations**

An interesting special case of time-varying models is that of subsidized populations (e.g., Pascual and Caswell 1991), which receive an input of individuals2

$$\mathbf{n}(t+1) = \mathbf{A}\_{l}\mathbf{n}(t) + \mathbf{b}(t). \tag{7.44}$$

The subsidy vector **b***(t)* might represent immigration, or the introduction of individual animals from a captive release program, or dispersal of the larvae of marine invertebrates or the seeds of plants. If **b***(t) <* 0, then it could represent the removal or harvest of individuals from the population (e.g., Hauser et al. 2006).<sup>3</sup>

Differentiating gives:

$$\frac{d\mathbf{n}(t+1)}{d\boldsymbol{\theta}^{\mathsf{T}}} = \mathbf{A}\_{l}\frac{d\mathbf{n}(t)}{d\boldsymbol{\theta}^{\mathsf{T}}} + \left(\mathbf{n}^{\mathsf{T}}(t)\otimes\mathbf{I}\_{s}\right)\frac{d\text{vec}\,\mathbf{A}\_{l}}{d\boldsymbol{\theta}^{\mathsf{T}}} + \frac{d\mathbf{b}(t)}{d\boldsymbol{\theta}^{\mathsf{T}}}.\tag{7.45}$$

<sup>2</sup>See Chap. 10 and Caswell (2008) for analysis of the equilibria of both linear and nonlinear versions of this equation, with applications to organizational dynamics and marine invertebrates.

<sup>3</sup>This type of harvest is unstable in the long run, but we are dealing here with transient dynamics.

If *θ* affects only the vital rates and not the subsidy process, then *d***n***(t)/θ* <sup>T</sup> reduces to (7.37), and subsidy affects the sensitivity only through its effect on *(***n**<sup>T</sup> *(t)* ⊗ **I***s)*. On the other hand, setting *θ* = **b** gives the effect of changes in the subsidy process:

$$\frac{d\mathbf{n}(t+1)}{d\mathbf{b}^{\mathsf{T}}} = \mathbf{A}\_{l} \frac{d\mathbf{n}(t)}{d\mathbf{b}^{\mathsf{T}}} + \mathbf{I}\_{\mathsf{s}}.\tag{7.46}$$

**Example: A subsidized model for the reintroduction of the Griffon vulture** The griffon vulture (*Gyps fulvus*) was once widely distributed in Europe, but has been eliminated from many areas, due primarily to poisoning and shooting. A reintroduction program has re-established a population in the Massif Central of southern France; Sarrazin and Legendre (2000) have analyzed this program. Reintroduction programs are increasingly important in conservation biology (Sarrazin and Barbault 1996; Snyder and Snyder 2000), and will become an important application of subsidized models. Transient dynamics are naturally critical for evaluating reintroduction programs, because the programs are of finite duration and are evaluated by short-term measures of success at, or shortly after, their conclusion.

In the case of the griffon vulture, birds can be introduced as juveniles or adults. Adults introduced from captivity have lower fertility and lower survival than wild adults. Here I use a simplification of the Sarrazin-Legendre model to show how transient sensitivity analysis could be used. The life cycle contains four age classes and a stage representing captive-reared adults (Fig. 7.3a). The survival of released adults is a fraction *p* of that of wild adults, and their fertility a fraction *q* of that of the wild adults. I assume these costs persist indefinitely; Sarrazin and Legendre (2000) explore both short- and long-term costs. Suppose that a manager is interested in the effects of the annual number *b*<sup>1</sup> of juveniles released, the number *b*<sup>5</sup> of adults released, and the relative survival *p* and relative fertility *q* of captive-reared adults.

One measure of success will be the population size at the end of the introduction program. The best such population, in terms of future population size, would be one with the highest total reproductive value, *N* = **v**<sup>T</sup> **n** (also called the stable equivalent population; see Chapters 8–9 of Keyfitz and Caswell 2005). The elasticity of stable equivalent population size4 is

$$\frac{\epsilon \, N}{\epsilon \, \theta^{\mathsf{T}}} = \frac{1}{\mathbf{v}^{\mathsf{T}} \mathbf{n}(t)} \mathbf{v}^{\mathsf{T}} \frac{d \mathbf{n}(t)}{d \theta^{\mathsf{T}}} \mathcal{D} \left( \theta \right) \qquad t = 1, \ldots, T \tag{7.47}$$

where **v** is the reproductive value vector from **A** and *θ* <sup>T</sup> = *b*<sup>1</sup> *b*<sup>5</sup> *p q* .

<sup>4</sup>The parameters under investigation here do not affect the reproductive value vector **v**. To analyze the sensitivity of stable equivalent population to, say, *σi*, would require the derivative of **v** as well; this is presented in Chap. 10.

**Fig. 7.3** (**a**) The life cycle graph and (**b**) the transient elasticity of stable equivalent population size *N (t)* = **v**<sup>T</sup>**n***(t)* to changes in juvenile introductions (*b*1), adult introductions (*b*5), adult survival costs (*p*), and adult fertility costs (*q*) for the Griffon vulture. Parameter values from Sarrazin and Legendre (2000); *σj* = 0*.*86, *σa* = 0*.*98, *f* = 0*.*33, *p* = 0*.*75, *q* = 0*.*51

Using parameter values in Sarrazin and Legendre (2000) and setting *b*<sup>1</sup> = *b*<sup>5</sup> (i.e., evaluating the value of juveniles and adults from a situation where they are introduced in equal numbers) gives the result in Fig. 7.3b, for an introduction program duration of up to 20 years.

It is always better to increase the number of juveniles relative to the number of adults introduced. The benefits of reducing survival and fertility costs (i.e., increasing *p* or *q*) increases with the duration of the program, as they have longer times available to operate. Reductions in the survival cost would have more impact than reductions in the fertility costs. These results are strongly influenced by the fact that the reproductive value of captive-reared adults is lower than that of newly fledged or released juveniles, which is reflected in the high elasticity of *N (t)* to juvenile releases. -

#### **7.7 Sensitivity of Nonlinear Models**

In density- or frequency-dependent models, the vital rates depend on the parameters *θ* and current population density **n***(t)*:

$$\mathbf{n}(t+1) = \mathbf{A}[\theta, \mathbf{n}(t)] \bmod t \,\tag{7.48}$$

Changes in *θ* affect dynamics directly, through **A**, and indirectly, through **n***(t)*. The transient sensitivity of **n***(t)* to parameter changes must include both effects.

Differentiating both sides of (7.48) and applying the vec operator gives the familiar differential expression

$$d\mathbf{n}(t+1) = \mathbf{A}[\theta, \mathbf{n}(t)]d\mathbf{n}(t) + \left(\mathbf{n}^{\dagger}(t) \otimes \mathbf{I}\_{s}\right)d\mathbf{v} \mathbf{c} \, \mathbf{A}[\theta, \mathbf{n}(t)].\tag{7.49}$$

But now, unlike in the linear case, *d*vec **A** includes both direct effects through *θ* and indirect effects through **n**, so the total differential is

$$d\text{vec}\,\mathbf{A} = \frac{\partial \text{vec}\,\mathbf{A}}{\partial \boldsymbol{\theta}^{\mathsf{T}}}d\boldsymbol{\theta} + \frac{\partial \text{vec}\,\mathbf{A}}{\partial \mathbf{n}^{\mathsf{T}}} \frac{\partial \mathbf{n}(t)}{\partial \boldsymbol{\theta}^{\mathsf{T}}}d\boldsymbol{\theta}.\tag{7.50}$$

Substituting (7.50) into (7.49) gives

$$\begin{split} \frac{d\mathbf{n}(t+1)}{d\boldsymbol{\theta}^{\mathsf{T}}} &= \mathbf{A}[\boldsymbol{\theta}, \mathbf{n}(t)] \, \frac{d\mathbf{n}(t)}{d\boldsymbol{\theta}^{\mathsf{T}}} \\ &+ \left(\mathbf{n}^{\mathsf{T}}(t) \otimes \mathbf{I}\_{s}\right) \, \frac{\partial \text{vec}\, \mathbf{A}[\boldsymbol{\theta}, \mathbf{n}(t)]}{\partial \boldsymbol{\theta}^{\mathsf{T}}} \\ &+ \left(\mathbf{n}^{\mathsf{T}}(t) \otimes \mathbf{I}\_{s}\right) \, \frac{\partial \text{vec}\, \mathbf{A}[\boldsymbol{\theta}, \mathbf{n}(t)]}{\partial \mathbf{n}^{\mathsf{T}}(t)} \, \frac{d\mathbf{n}(t)}{d\boldsymbol{\theta}^{\mathsf{T}}} . \end{split} \tag{7.51}$$

The first two terms are familiar from the density-independent case; the third term accounts for the effects of *θ* on **A** through its effects on **n***(t)*. Rearranging terms gives the transient sensitivity,

$$\frac{d\mathbf{n}(t+1)}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left\{ \mathbf{A}[\boldsymbol{\theta}, \mathbf{n}(t)] + \left( \mathbf{n}^{\mathsf{T}}(t) \otimes \mathbf{I}\_{s} \right) \frac{\partial \text{vec} \, \mathbf{A}[\boldsymbol{\theta}, \mathbf{n}(t)]}{\partial \mathbf{n}^{\mathsf{T}}(t)} \right\} \, \frac{d\mathbf{n}(t)}{d\boldsymbol{\theta}^{\mathsf{T}}} $$
 
$$+ \left( \mathbf{n}^{\mathsf{T}}(t) \otimes \mathbf{I}\_{s} \right) \frac{\partial \text{vec} \, \mathbf{A}[\boldsymbol{\theta}, \mathbf{n}(t)]}{\partial \boldsymbol{\theta}^{\mathsf{T}}}. \tag{7.52}$$

**Example: Transient sensitivity of** *Tribolium* Flour beetles of the genus *Tribolium* have been used for a series of models of, and experiments on, nonlinear dynamics, reviewed by Cushing et al. (2003). *Tribolium* lives in stored flour. Adults and larvae cannibalize eggs, and adults cannibalize pupae; these interactions provide the density-dependence, and are captured in a three-stage (larvae, pupae, and adults) model, with

$$\mathbf{A}[\theta, \mathbf{n}] = \begin{pmatrix} 0 & 0 & b \exp(-c\_{el}n\_1 - c\_{ea}n\_3) \\ 1 - \mu\_l & 0 & 0 \\ 0 & \exp(-c\_{pd})n\_3 & 1 - \mu\_a \end{pmatrix} \tag{7.53}$$

where *b* is the clutch size, *cea*, *cel*, and *cpa* are cannibalism rates (of eggs by adults, eggs by larvae, and pupae by adults), and *μl* and *μa* are larval and adult mortalities. Parameter values from experiments reported by Costantino et al. (1997) give the transient dynamics in Fig. 7.4, following introduction of a single adult.

The sensitivity of this transient behavior requires the derivatives of **A**[*θ,* **n**] to the parameters and to the densities. Substituting these derivatives into (7.52) gives the transient sensitivities by a simple iteration. The derivative matrices are given in an appendix to Caswell (2007).

**Fig. 7.4** (**a**) The transient dynamics of the *Tribolium* model following introduction of a single adult. Parameters from Costantino et al. (1997). (**b**) The transient elasticity of the metabolic population size *Nm(t)* to each of the parameters of the *Tribolium* model, for the first 20 time steps following the introduction of a single adult

*Tribolium* is a pest. The damage it causes might, I suppose, be related to its consumption, which might be measured by the metabolic rate. Emekci et al. (2001) estimated the per capita metabolic rate of larvae, pupae, and adults. Using their results, we define the metabolic population size as *Nm(t)* = **c**<sup>T</sup> **<sup>n</sup>***(t)* where **<sup>c</sup>**<sup>T</sup> <sup>=</sup> 914*.*5 *μl* CO2 h−1. The elasticities of *Nm(t)* to the parameters are

$$\frac{\epsilon N\_m}{\epsilon \theta^\top} = \frac{1}{N\_m(t)} \operatorname{c}^\dagger \frac{d\mathbf{n}(t)}{d\theta^\top} \mathcal{D}\left(\theta\right). \tag{7.54}$$

for *t* = 1*,...,* 20.

The results are shown in Fig. 7.4. For the first 5 or so iterations, *Nm* is more elastic to the clutch size than to the cannibalism or mortality rates. After that, the impact of *b* declines and the impact (negative) of the cannibalism coefficients increases. Beyond 10 time steps, *Nm* is affected primarily by *b* (positively) and *cea(negat ively)*. Changes in mortality (*μa* and *μl*) have only small effects. Such changes in the relative impact of the parameters over short periods of time are typical of transient sensitivities. Interestingly, the elasticities of total population size *N*tot = /*ni* (not shown) show a similar pattern, but lack the period-2 fluctuation evident in Fig. 7.4. This reflects the interaction of the weighting pattern (much more uneven in the calculation of *Nm* than *N*tot) and transient fluctuations in the stage distribution. Asymptotic sensitivity calculations are unaffected by such differences.

The parameter values used here lead to a stable equilibrium, but the transient calculations apply equally to other types of dynamics. -

#### **7.8 Sensitivity of Population Projections**

The most common transient analyses of populations appear in the population projections provided by local, national, and international offices. These projections are usually carried out by the cohort component method, which uses mortality, fertility, and migration to describe the dynamics of each age×sex combination. The calculations are transient because the begin with the current, rather than an asymptotic, age-sex distribution and are carried out over a short time horizon (usually a few decades). In the first issue of the first volume of the then-new journal *Demography*, Nathan Keyfitz described the "population projection as a matrix operator" (Keyfitz 1964). He showed that population projections using the cohort component method could be written as matrix population models, and emphasized the value in doing so to focus attention on the mathematical structure of the projection, inviting deeper analyses of its properties with more powerful mathematical tools. Considering projections as matrix operators allows the use of matrix calculus methods to develop a thorough perturbation analysis of population projections (Caswell and Sanchez Gassen 2015; Sanchez Gassen and Caswell 2018).

To present the basics of projection sensitivity analysis, we begin with a simple one-sex model, but we focus most of our attention on a two-sex model that includes separate rates for males and females.

The single-sex projection can be written as

$$\mathbf{n}(t+1) = \mathbf{A}(t)\mathbf{n}(t) + \mathbf{b}(t) \qquad \mathbf{n}(0) = \mathbf{n}\_0 \tag{7.55}$$

where **n***(t)* is a vector whose entries are the numbers of individuals in each age class or stage at time *t*, **A***(t)* is a projection matrix incorporating the vital rates at time *t*, and **b***(t)* is a vector giving the number of immigrants in each age class or stage at time *t*. The projection begins with a specified initial condition, denoted **n**0, and is carried out until some target time *T* .

To develop a two-sex projection, we define population vectors **n***<sup>f</sup>* and **n***m*, and projection matrices **A***<sup>f</sup>* and **A***m*, for females and males, respectively. We assume that reproduction is female dominant,<sup>5</sup> so all fertility is attributed to females. We decompose the projection matrices for females and males into

$$\mathbf{A}\_f(t) = \mathbf{U}\_f(t) + \phi \mathbf{F}(t) \tag{7.56}$$

$$\mathbf{A}\_m(t) = \mathbf{U}\_m(t) \tag{7.57}$$

where **U** describes transitions and survival of extant individuals and **F** describes the production of new individuals by reproduction.

<sup>5</sup>Two-sex models that do not assume dominance by one sex have been used to project animal populations, but not, as far as I know, human populations (e.g., Jenouvrier et al. 2009, 2010, 2012).

7.8 Sensitivity of Population Projections 155

In an age-classified model, **F** will have effective fertilities (including infant and maternal survival as appropriate) on the first row and zeros elsewhere. A proportion *φ* of the offspring are female. This model attributes reproduction to females; hence there is no need to create separate fertility matrices for reproduction by males and females.

The male component of the population is projected by the survival matrix **U***m*; the input of new individuals comes from the female population. The projection model becomes

$$\mathbf{n}\_f(t+1) = \left[\mathbf{U}\_f(t) + \phi \mathbf{F}(t)\right] \mathbf{n}\_f(t) + \mathbf{b}\_f(t) \tag{7.58}$$

$$\mathbf{n}\_m(t+1) = \mathbf{U}\_m(t)\mathbf{n}\_m(t) + (1-\phi)\mathbf{F}(t)\mathbf{n}\_f(t) + \mathbf{b}\_m(t) \tag{7.59}$$

The sensitivity of the two-sex projection is given by the two derivatives,

$$\frac{d\mathbf{n}\_f(t)}{d\boldsymbol{\theta}^\sf T} \quad \text{and} \quad \frac{d\mathbf{n}\_m(t)}{d\boldsymbol{\theta}^\sf T} \quad \quad t, u = 0, \ldots, T.$$

These sensitivities are obtained from dynamic expressions, for the female population

$$\underbrace{\frac{d\mathbf{n}\_f(t+1)}{d\boldsymbol{\theta}^\mathsf{T}(u)}}\_{\text{missivity at }\boldsymbol{\epsilon}+1} = \underbrace{\left(\mathbf{U}\_f(t) + \boldsymbol{\phi}\mathbf{F}(t)\right)\frac{d\mathbf{n}\_f(t)}{d\boldsymbol{\theta}^\mathsf{T}(u)}}\_{\text{sensitivity at }\boldsymbol{\epsilon}} + \underbrace{\left(\mathbf{n}\_f^\mathsf{T}(t) \otimes \mathbf{I}\_w\right)\left(\frac{d\mathbf{v}\mathbf{c}\,\mathbf{U}\_f(t)}{d\boldsymbol{\theta}^\mathsf{T}(u)} + \boldsymbol{\phi}\frac{d\mathbf{v}\mathbf{c}\,\mathbf{F}(t)}{d\boldsymbol{\theta}^\mathsf{T}(u)}\right)}\_{\text{offsets via female transitions and ferilitity}}$$

 sensitivity at *t* + 1

$$\text{effectts via female transitions and iteratively}$$

$$+\underbrace{\begin{array}{c} d\mathbf{b}\_f(t) \\ \underbrace{d\boldsymbol{\theta}^\mathsf{T}(u)}\_{\cdot \cdot} \\ \end{array}}\_{\cdot \cdot \cdot} \tag{7.60}$$

 effects via immigration

sensitivity at *t*

and the male population

$$\begin{aligned} \underbrace{d\mathbf{n}\_{m}(t+1)}\_{d\boldsymbol{\theta}^{\mathsf{T}}(\boldsymbol{u})} \qquad \qquad = \underbrace{\mathbf{U}\_{m}(t)\frac{d\mathbf{n}\_{m}(t)}{d\boldsymbol{\theta}^{\mathsf{T}}(\boldsymbol{u})} + (1-\boldsymbol{\phi})\mathbf{F}(t)\frac{d\mathbf{n}\_{f}(t)}{d\boldsymbol{\theta}^{\mathsf{T}}(\boldsymbol{u})}}\_{d\boldsymbol{\theta}^{\mathsf{T}}(\boldsymbol{u})} + \underbrace{\left(\mathbf{n}\_{m}^{\mathsf{T}}(t)\otimes\mathbf{I}\_{a}\right)}\_{d\boldsymbol{\theta}^{\mathsf{T}}(\boldsymbol{u})} \underbrace{\frac{d\mathbf{v}\mathbf{c}\,\mathbf{U}\_{m}(t)}{d\boldsymbol{\theta}^{\mathsf{T}}(\boldsymbol{u})}}\_{d\boldsymbol{\theta}^{\mathsf{T}}(\boldsymbol{u})}.\end{aligned}$$

 sensitivity at *t* + 1  sensitivities at *t*

 effects via male transitions

$$+\underbrace{(1-\phi)\left(\mathbf{n}\_f^{\mathsf{T}}(t)\otimes\mathbf{I}\_w\right)}\_{\text{offsets via famale factorial}}\frac{d\mathbf{vec}\,\mathbf{F}(t)}{d\boldsymbol{\theta}^{\mathsf{T}}(u)}\quad+\underbrace{\frac{d\mathbf{b}\_m(t)}{d\boldsymbol{\theta}^{\mathsf{T}}(u)}}\_{\text{offsets via imimorration}}\tag{7.61}$$

effects via female fertility effects via immigration

Equations (7.60) and (7.61) are iterated from initial conditions

$$\frac{d\mathbf{n}\_f(0)}{d\boldsymbol{\theta}^\mathsf{T}(\boldsymbol{\mu})} = \frac{d\mathbf{n}\_m(0)}{d\boldsymbol{\theta}^\mathsf{T}(\boldsymbol{\mu})} = \mathbf{0}\_{\boldsymbol{\omega}\times\boldsymbol{p}}\tag{7.62}$$

along with the iteration of equations (7.58) and (7.59) for the population vectors **n***<sup>f</sup> (t)* and **n***m(t)*. For complete details, see Caswell and Sanchez Gassen (2015).

The terms in (7.61) are labelled to show how the processes of transitions, fertility and migration, for males and females, combine to produce sensitivity of a transient population. As before, the sensitivity at *t* + 1 depends on the sensitivity at time *t* and on the effects of the parameter vector on the transition and fertility matrices and on the immigration vector. In the next section we turn to the calculation of these derivatives.

The elasticities of **n***<sup>f</sup> (t)* are given by

$$\frac{\epsilon \mathbf{n}\_f(t)}{\epsilon \theta^\top(u)} = \mathcal{D} \left[ \mathbf{n}\_f(t) \right]^{-1} \frac{d \mathbf{n}\_f(t)}{d \theta^\top(u)} \mathcal{D} \left[ \boldsymbol{\theta}(u) \right] \tag{7.63}$$

with a similar expression for **n***m*.

Caswell and Sanchez Gassen (2015) present a detailed analysis of a projection for the population of Spain, published by the Instituto Nacional de Estadística (INE), for the years 2012–2052. They calculated the sensitivity and elasticity of total population, male and female population, the school age population (6–16 years), the part of the population expected to suffer from dementia, and the dependency and support ratios. All these outcomes are calculated from the basic projection using the methods in Sect. 7.3. In a more extensive comparison, Sanchez Gassen and Caswell (2018) have applied the approach to the Europop2013 projections for the 28 member states of the European Union, plus Iceland, Norway, and Sweden, for the years 2013–2080.

#### **7.9 Discussion**

In addition to their obvious role in population projections, transient effects are critically important in studies of climate change and other short term management issues (Ezard et al. 2010). A recent study found that simulations of invasive species were strongly influenced by transient effects (Muthukrishnan et al. 2018). Matrix calculus makes transient sensitivity analysis straightforward and applicable to a wide range of models and perturbations. The approach calculates sensitivities and elasticities as a dynamic system, iterated in parallel with the dynamics of the transient solution itself.

This dynamic approach reveals the fundamental structure underlying the sensitivity calculation. The results bear a striking family resemblance, from the linear, time-invariant case (7.6), to the time-varying case (7.41), the case of subsidized populations (7.45), the nonlinear case (7.52), and the time-varying, two-sex, subsidized model that forms the basis for the cohort component method of population projection in equations (7.61) and (7.60).

The examples here sound like stories—*suppose that* someone (e.g., a manager) is interested in some aspect of the population (e.g., its total size, or variance, or average growth,. . . ) over some time interval. Or *suppose* that mortality, fertility, and immigration develop in the following way. This emphasizes the flexibility of this approach, and also the importance about thinking clearly about the dependent variables and time scales of interest. The list of dependent variables in Sect. 7.3 can no doubt be extended. It may be repeating the obvious, but transient sensitivity analysis depends on initial conditions. Each of the examples had to choose an initial condition and argue for its relevance.

Section 10.2.6 in Chap. 10 briefly considers the sensitivity analysis of equilibria to continuous-time systems. Richard et al. (2015) have developed a very general sensitivity analysis of transient dynamics in continuous systems (both linear and nonlinear). They point out and nicely demonstrate the parallels between continuoustime models and the discrete-time models considered here, the link being the creation of a dynamic model for the sensitivities that is solved along with the dynamics of the system itself.

#### **Bibliography**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Chapter 8 Periodic Models**

#### **8.1 Introduction**

Periodic matrix models are often used to study cyclical temporal variation (seasonal or interannual), sometimes as a (perhaps crude) approximation to stochastic models. However, formally periodic models also appear when multiple processes (e.g., demography and dispersal) operate within a single projection interval. The models take the form of periodic matrix products. A familiar example is when population projection over an annual interval is described as a product of seasonal operators. The perturbation analysis of periodic models (Caswell and Trevisan 1994; Lesnoff et al. 2003; Caswell and Shyu 2012) must specify both the vital rates affected by the perturbation and the timing of the perturbation within the cycle. This chapter presents a general approach to the perturbation analysis of both linear and nonlinear periodic models. The results consist of a series of analyses of some of the most commonly encountered periodic models.

If the environment is time-invariant on the scale of a chosen projection interval (e.g., from year to year), the result is a periodic matrix population model in which the seasonal product repeats itself. Such a model can be written as

$$\mathbf{n}(t+1) = \mathbf{B}\_p \cdots \mathbf{B}\_2 \mathbf{B}\_1 \mathbf{n}(t) \tag{8.1}$$

Here, **B***<sup>i</sup>* is the matrix at phase *i* of the cycle and *p* is the period. The period is the number of phases in the cycle; i.e., the number of matrices in the periodic matrix product in (8.1). Neither the identities nor the number of stages need be the same from one phase to the next, so the matrices **B***<sup>i</sup>* may be rectangular rather than square.

Chapter 8 is modified, under the terms of a Journal Publishing Agreement with Elsevier Publishers, from: Caswell, H. and E. Shyu. 2012. Sensitivity analysis of periodic matrix population models. Theoretical Population Biology. 82:329–339. ©Elsevier.

The phases need not be the same length, so the period may or may not be measured in units of time. For example, in the model of Pico et al. (2002), each season is of 2 months duration, and the period (*p* = 6) corresponds directly to a time scale. In contrast, the model of Hunter and Caswell (2005a) has three phases, with durations of 3 weeks, 5 weeks, and 10 months, respectively. The period (*p* = 3) of that model does not correspond to a time scale, but it identifies the number of matrices in the periodic product and appears in calculations in the same role as *p* = 6 in the model of Pico et al. (2002).

The projection matrix over the entire periodic cycle is<sup>1</sup>

$$\mathbf{A} = \mathbf{B}\_p \cdots \mathbf{B}\_2 \mathbf{B}\_l \tag{8.2}$$

The earliest studies of periodic matrix models were due to Darwin and Williams (1964), Skellam (1966), and MacArthur (1968). In recent years, with little fanfare, periodic models have emerged as an important tool for incorporating multiple processes within a single projection interval. Uses of periodic models include the following.

1. Seasonal variation. Plants and animals experience obvious and dramatic seasonal variation in their demographic rates. Periodic models have been used to describe this variation, with seasons variously defined in terms of monthly periods, calendar seasons, or in terms of environmental events such as rainfall or flood patterns (e.g., Smith et al. 2005).

Although annual or near-annual species are obvious candidates for periodic models, within-year time scales may also be important for long-lived species. For example, Hunter and Caswell (2005a) incorporated chick development events on a time scale of weeks into a periodic model for the sooty shearwater, which has a lifetime of decades. Similarly, Jenouvrier et al. (2010, 2014) have used periodic models to capture the timing of events in the breeding cycle within a portion of the year in the long-lived emperor penguin.

$$\begin{array}{c} \mathbf{A}\_2 = \mathbf{B}\_1 \mathbf{B}\_p \dots \mathbf{B}\_2 \\\\ \vdots \\\\ \mathbf{A}\_P = \mathbf{B}\_{P-1} \dots \mathbf{B}\_1 \mathbf{B}\_P \end{array}$$

<sup>1</sup>Although we will not address it in this chapter, the model (8.1) can be written in a way that explicitly defines the starting phase in the cycle. As written, **A** in (8.2) projects from phase 1 to phase 1; if desired we could write this as **A**<sup>1</sup> and define matrices

The **A***<sup>i</sup>* are obtained by cyclic permutations of the sequence {**B***p,...,***B**1}; each of these projects from a different phase in the cycle. Some demographic properties (e.g., the population growth rate *λ*) are invariant with respect to such permutations; others (e.g., the eigenvectors) are not (Caswell 2001). In this chapter, we will start with phase 1 and refer to **A** rather than **A**1.


$$\mathbf{U} = \mathbf{G}\boldsymbol{\Sigma}\tag{8.3}$$

which creates a period-2 periodic matrix product within the model.


#### *8.1.1 Perturbation Analysis*

As in Fig. 8.1, we suppose that in phase *i* of the cycle, the parameter vector takes on the value *θ<sup>i</sup>* and determines the matrix **B***i*. The projection matrix **A** is the product, in the specified order, of the **B***i*. Although the output *ξ* is calculated from **A**, the parameter dependence operates through the **B***<sup>i</sup>* (Fig. 8.1). The sensitivity of *ξ* to the elements of **A** is in general not of interest, because those elements are complicated expressions involving the elements of all the **B***i*, and thus mix disparate biological processes. Here we calculate the sensitivity of demographic outcomes to the entries of the **B***i*.

In this chapter we analyze linear periodic models of the form (8.1) and the cyclic dynamics of nonlinear seasonal models with delayed density effects. We will briefly discuss the generalization of the multistate age×stage-classified models


**Table 8.1** Table of symbols in this chapter

**Fig. 8.1** A vector *θ* of parameters determines an output variable *ξ* , which may be a scalar, vector, or matrix. The parameter vector will generally take on different values at each phase in the cycle, and determine the phase-specific matrix **B***i*. These matrices determine the projection matrix **A** as a periodic matrix product; the output variable is computed from **A**. The perturbation problem is to compute the sensitivity or elasticity of *ξ* to *θ*

explored in Chap. 6 to an arbitrary number of classifications. We extend the LTRE decomposition analysis to the periodic case, making it possible to analyze effects of parameter changes at any point in a periodic environment.

#### **8.2 Linear Models**

Consider the basic model (8.1) with projection matrix (8.2). The period of the cycle is *p*. To allow for differences in the state vector at different phases within the cycle, define the number of stages at phase *i* as *si*. Thus the matrix **B***<sup>i</sup>* is of dimension *si*+<sup>1</sup> × *si*, with the subscript *i* interpreted mod*(p)* (that is, *(p* + 1*)* mod*(p)* = 1).

Let *ξ* (dimension *m* × 1) denote an output variable calculated from **A**, where *ξ* might be a scalar, a vector, or a vectorized matrix. Let *θ* be a parameter vector (dimension *q* × 1). The derivative of *ξ* with respect to *θ* is the *m* × *q* matrix

$$\frac{d\xi}{d\theta^{\mathsf{T}}} = \left(\frac{d\xi\_{l}}{d\theta\_{j}}\right) \qquad i = 1, \ldots, m; \ j = 1, \ldots, q \tag{8.4}$$

By the chain rule, the effects of the parameters on *ξ* are captured in the matrix product

$$\frac{d\boldsymbol{\xi}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \frac{d\boldsymbol{\xi}}{d\mathbf{v}\mathbf{c}^{\mathsf{T}}\mathbf{A}} \frac{d\mathbf{v}\mathbf{c}\mathbf{A}}{d\boldsymbol{\theta}^{\mathsf{T}}}.\tag{8.5}$$

The first term in (8.5) is the derivative of the output variable *ξ* with respect to the matrix **A** from which it is calculated. The second term in (8.5) is the derivative of the periodic product matrix **A** with respect to the parameter vector *θ*. To obtain this, differentiate (8.2), to obtain

$$d\mathbf{A} = \mathbf{B}\_p \cdots \mathbf{B}\_2 \, (d\mathbf{B}\_1)$$

$$+ \mathbf{B}\_p \cdots (d\mathbf{B}\_2) \, \mathbf{B}\_1$$

$$\vdots$$

$$+ \left(d\mathbf{B}\_p\right) \mathbf{B}\_{p-1} \cdots \mathbf{B}\_1 \tag{8.6}$$

It is convenient to define the matrix **C***<sup>j</sup> <sup>i</sup>* as the ordered product (from right to left) of the **B** matrices from *i* up to *j* :

$$\mathbf{C}\_{l}^{j} = \mathbf{B}\_{f} \cdots \mathbf{B}\_{l} \qquad i \le j \tag{8.7}$$

and set **C**<sup>0</sup> <sup>1</sup> <sup>=</sup> **<sup>C</sup>***<sup>p</sup> <sup>p</sup>*+<sup>1</sup> <sup>=</sup> **<sup>I</sup>***s*<sup>1</sup> . Then (8.6) becomes

$$d\mathbf{A} = \mathbf{C}\_2^p \left(d\mathbf{B}\_1\right) + \mathbf{C}\_3^p \left(d\mathbf{B}\_2\right) \mathbf{C}\_1^l \cdots + \left(d\mathbf{B}\_p\right) \mathbf{C}\_l^{p-1} \tag{8.8}$$

Applying the vec operator to both sides gives

$$d\text{vec}\,\mathbf{A} = \sum\_{l=1}^{p} \left[ \left(\mathbf{C}\_{l}^{l-1}\right)^{\mathsf{T}} \otimes \mathbf{C}\_{l+1}^{p} \right] \, d\text{vec}\,\mathbf{B}\_{l} \tag{8.9}$$

Equation (8.9) accounts automatically for the possibly different dimensions of the **B***i*. The resulting derivative with respect to the parameter vector *θ* is

$$\frac{d\mathbf{vec}\,\mathbf{A}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \sum\_{l=1}^{p} \left[ \left( \mathbf{C}\_{l}^{l-1} \right)^{\mathsf{T}} \otimes \mathbf{C}\_{l+1}^{p} \right] \frac{d\mathbf{vec}\,\mathbf{B}\_{l}}{d\boldsymbol{\theta}^{\mathsf{T}}}.\tag{8.10}$$

where *d*vec **B***i/dθ* <sup>T</sup> is the derivative of the matrix **B***<sup>i</sup>* with respect to the parameter vector *θ*, evaluated at *θi*. Equation (8.10) sums the contributions of the derivatives of all of the phase-specific matrices **B***<sup>i</sup>* with respect to *θ*, thus accounting for all the ways in which *θ* may affect the demographic rates at each point in the cycle. As written, (8.10) gives the result of perturbing *θ* at each point in the cycle. The effect of a phase-specific perturbation is easily obtained by summing only over phases in which *θ<sup>i</sup>* is modified.

Substituting (8.10) into the formula (8.5) gives the general expression for the sensitivity of *ξ* to changes affecting any or all of the **B***i*:

$$\frac{d\boldsymbol{\xi}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \frac{d\boldsymbol{\xi}}{d\mathbf{vec}^{\mathsf{T}}\mathbf{A}} \left( \sum\_{l=1}^{p} \left[ \left( \mathbf{C}\_{l}^{l-1} \right)^{\mathsf{T}} \otimes \mathbf{C}\_{l+1}^{p} \right] \frac{d\mathbf{vec}\,\mathbf{B}\_{l}}{d\boldsymbol{\theta}^{\mathsf{T}}} \right). \tag{8.11}$$

The elasticity of *ξ* to *θ* is the matrix

$$\left(\frac{\epsilon\mathfrak{k}}{\epsilon\mathfrak{\theta}^{\mathsf{T}}} = \left(\frac{\theta\_{j}}{\xi\_{l}}\frac{d\xi\_{l}}{d\theta\_{j}}\right)\right) \tag{8.12}$$

$$=\mathcal{D}\left(\boldsymbol{\xi}\right)^{-1}\frac{d\boldsymbol{\xi}}{d\text{vec}\,^\mathsf{T}\mathbf{A}}\left(\sum\_{l=1}^{p}\left[\left(\mathbf{C}\_{l}^{l-1}\right)^{\mathsf{T}}\otimes\mathbf{C}\_{l+1}^{p}\right]\frac{d\text{vec}\,\mathbf{B}\_{l}}{d\boldsymbol{\theta}^{\mathsf{T}}}\right)\mathcal{D}\left(\boldsymbol{\theta}\right)\tag{8.13}$$

where D *(***x***)* is a diagonal matrix with **x** on the diagonal and zeros elsewhere. Because elasticities are logarithmic derivatives, they apply only when *ξ >* 0 and *θ* ≥ 0.

#### *8.2.1 A Simple Harvest Model*

The projection matrix for a simple harvest model (e.g., Hauser et al. 2006) can be written

$$\mathbf{A} = \mathbf{B} \left( \mathbf{I} - \mathbf{H} \right). \tag{8.14}$$

The matrix **B** describes demography in the absence of harvest. The matrix **H** = D *(***h***)* is a harvest matrix, where *hi* is the probability that an individual of stage *i*

is harvested.<sup>2</sup> Either **B**, **H**, or both may be functions of a vector *θ* of parameters. Differentiating (8.14) and applying the vec operator gives

$$d\operatorname{vec}\mathbf{A} = -\left(\mathbf{I}\_s \otimes \mathbf{B}\right)d\operatorname{vec}\mathbf{H} + \left[\left(\mathbf{I} - \mathbf{H}\right)^{\mathsf{T}} \otimes \mathbf{I}\_s\right]d\operatorname{vec}\mathbf{B}.\tag{8.15}$$

The diagonal matrix **H** can be written

$$\mathbf{H} = \mathbf{I}\_s \diamond \begin{pmatrix} \mathbf{1}\_s \mathbf{h} \end{pmatrix} \tag{8.16}$$

where **1***<sup>s</sup>* is a *s* × 1 vector of ones and ◦ denotes the Hadamard product. The differential of **H** in (8.16) is

$$d\text{vec}\,\mathbf{H} = \mathcal{D}\,\left(\text{vec}\,\mathbf{I}\_s\right)\left(\mathbf{I}\_s \otimes \mathbf{1}\_s\right)d\mathbf{h}.\tag{8.17}$$

Combining (8.16) and (8.15) and applying the chain rule gives the derivative with respect to *θ*:

$$\frac{d\mathbf{vec}\,\mathbf{A}}{d\boldsymbol{\theta}^{\top}} = \underbrace{-\left(\mathbf{I}\_{s}\otimes\mathbf{B}\right)\mathcal{D}\left(\mathbf{vec}\,\mathbf{I}\_{s}\right)\left(\mathbf{I}\_{s}\otimes\mathbf{1}\_{s}\right)}\_{\text{perturnations of }\mathbf{h}}\frac{d\mathbf{h}}{d\boldsymbol{\theta}^{\top}} + \underbrace{\left[(\mathbf{I}-\mathbf{H})^{\top}\otimes\mathbf{I}\_{s}\right]\frac{d\mathbf{vec}\,\mathbf{B}}{d\boldsymbol{\theta}^{\top}}}\_{\text{perturnations of }\mathbf{B}}.\tag{8.18}$$

The conditional probability model (8.3) has the same form as the harvest model (8.14), so a similar analysis applies to it as well:

$$d\operatorname{vec}\mathbf{U} = (\mathbf{I}\otimes\mathbf{G})\mathcal{D}\left(\operatorname{vec}\mathbf{I}\right)(\mathbf{I}\otimes\mathbf{1})d\sigma + (\mathbf{Z}\otimes\mathbf{I})d\operatorname{vec}\mathbf{G}.\tag{8.19}$$

However, the conditional transition matrix **G** is column-stochastic (all columns sum to 1), because all loss of individuals is accounted for by . Thus relevant perturbations must be parameterized so that the stochasticity is preserved. For example, if **G** describes growth in the standard size-classified model (Caswell 2001, Section 4.2), e.g.,

$$\mathbf{G} = \begin{pmatrix} 1 - \wp\_1 & 0 & 0 \\ \wp\_1 & 1 - \wp\_2 & 0 \\ 0 & \wp\_2 & 1 - \wp\_3 \end{pmatrix} \tag{8.20}$$

then perturbations of the *γi* will preserve stochasticity of **G**. If **G** has no such convenient parameterization, then changes in the entries of **G** must be compensated for by changes elsewhere in the same column (see Caswell 2001; Hill et al. 2004;

<sup>2</sup>Alternatively, let *μi* be the mortality due to harvest experienced by an individual in stage *i*. Then **H** = exp [−D *(μ)*]. Harvest imposes an additional, additive hazard on top of the natural mortality contained in **B**.

Theorem 4.5 of Caswell 2013). For explicit formulas for compensation, see Chap. 11 of this volume; for an application, see van Daalen and Caswell (2017).

The harvest model (8.14) can be extended to describe harvest imposed at a specified phase within a *p*-cycle. Suppose that harvest takes place between phase *m* and phase *m* + 1, so that

$$\mathbf{A} = \mathbf{B}\_p \cdots \mathbf{B}\_{m+1} \text{ (I}-\text{H)} \text{ } \mathbf{B}\_m \cdots \mathbf{B}\_l \text{ } \tag{8.21}$$

(see Darwin and Williams (1964) for an early example of this kind of seasonal harvest model). Using the same approach, it can be shown that

$$\begin{split} \frac{d\text{vec}\,\mathbf{A}}{d\boldsymbol{\theta}^{\top}} &= -\left[\left(\mathbf{C}\_{1}^{m}\right)^{\top}\otimes\mathbf{C}\_{m+1}^{p}\right]\frac{d\text{vec}\,\mathbf{H}}{d\boldsymbol{\theta}^{\top}} \\ &+ \left[\mathbf{I}\_{s\_{1}}\otimes\mathbf{C}\_{m+1}^{p}\left(\mathbf{I}-\mathbf{H}\right)\right]\sum\_{i=1}^{m}\left[\left(\mathbf{C}\_{1}^{i-1}\right)^{\top}\otimes\mathbf{C}\_{i+1}^{m}\right]\frac{d\text{vec}\,\mathbf{B}\_{i}}{d\boldsymbol{\theta}^{\top}} \\ &+ \left[\left(\left(\mathbf{I}-\mathbf{H}\right)\mathbf{C}\_{1}^{m}\right)^{\top}\otimes\mathbf{I}\_{s\_{1}}\right]\sum\_{i=m+1}^{p}\left[\left(\mathbf{C}\_{m+1}^{i-1}\right)^{\top}\otimes\mathbf{C}\_{i+1}^{p}\right]\frac{d\text{vec}\,\mathbf{B}\_{i}}{d\boldsymbol{\theta}^{\top}} \end{split} \tag{8.22}$$

The expression (8.17) can be substituted for *d*vec **H** in (8.22), and the resulting expression for *d*vec **A***/dθ* <sup>T</sup> substituted into (8.5).

#### **8.3 Multistate Models**

We have encountered several examples of models in which individuals are classified by two criteria (age and stage, stage and environmental state, stage and location, etc.). These multistate models can be constructed by the vec-permutation matrix approach; see Chaps. 5 and 6 or Hunter and Caswell (2005b) and Caswell et al. (2018).

Suppose individuals classified by two criteria; e.g., stages (1*,...,s*) and locations (1*,...,r*). One might describe population dynamics in terms of stage transitions within locations, and spatial movement within stages, with the two processes acting sequentially. Thus individuals first survive and reproduce according to their stage-specific demography, and then disperse among locations, and then repeat. Let **B***<sup>i</sup>* be the *s* × *s* matrix describing transitions and reproduction within location *i*, and **<sup>M</sup>***<sup>j</sup>* the *<sup>r</sup>* <sup>×</sup> *<sup>r</sup>* matrix describing movement probabilities for stage *<sup>j</sup>* . Let <sup>B</sup> and <sup>M</sup> be the *sr* × *sr* block diagonal matrices with the **B***<sup>i</sup>* and the **M***<sup>j</sup>* , respectively, on the diagonal.

The population is projected by

$$\mathbf{n}(t+1) = \mathbf{K}^{\mathsf{T}} \mathsf{M} \mathbf{K} \mathbb{B} \ \mathbf{n}(t) \dots \tag{8.23}$$

The matrix **K** is the vec-permutation matrix, or commutation matrix, (Henderson and Searle 1981; Magnus and Neudecker 1979), which satisfies

$$\text{vec}\,\mathcal{N}^{\tau} = \mathbf{K} \,\text{vec}\,\mathcal{N} \tag{8.24}$$

For the calculation of **K**, see Sect. 2.2.3.

The model (8.23) is formally periodic, with the operation of B and M alternating; thus the projection matrix is

$$\mathbf{A} = \mathbf{K}^{\mathsf{T}} \mathsf{M} \mathbf{K} \mathsf{B}. \tag{8.25}$$

The dependence of **A** on the parameters *θ* can take place through **B***<sup>i</sup>* [*θ*], **M***<sup>i</sup>* [*θ*], or both.

The general sensitivity formula (8.5) requires the derivative *d***A***/dθ* <sup>T</sup> . Differentiating (8.25) gives

$$d\mathbf{A} = \mathbf{K}^{\mathsf{T}} \left( d\mathbb{M} \right) \mathbf{K} \mathbb{B} + \mathbf{K}^{\mathsf{T}} \mathbf{M} \mathbf{K} \left( d\mathbb{B} \right). \tag{8.26}$$

Applying the vec operator gives

$$d\operatorname{vec}\mathbf{A} = \left(\mathbb{B}^{\mathsf{T}}\mathbf{K}^{\mathsf{T}}\otimes\mathbf{K}^{\mathsf{T}}\right)d\operatorname{vec}\mathbf{M} + \left(\mathbf{I}\_{sp}\otimes\mathbf{K}^{\mathsf{T}}\mathbf{M}\mathbf{K}\right)d\operatorname{vec}\mathbf{B} \tag{8.27}$$

We want to express *d*vec B and *d*M in terms of the derivatives of their diagonal entries **B***<sup>i</sup>* and **M***<sup>j</sup>* . This can be done using equations (14) and (15) of Caswell and van Daalen (2016). Define the matrices **P***<sup>i</sup>* and **Q***i*, of dimension *rs* × *s* and *s* × *rs*, respectively,

$$\mathbf{P}\_l = \begin{pmatrix} \mathbf{0}\_{s(l-1)\times s} \\ \mathbf{I}\_s \\ \mathbf{0}\_{s(r-l)\times s} \end{pmatrix} \qquad \mathbf{Q}\_l = \begin{pmatrix} \mathbf{0}\_{s\times(l-1)s} \ \mathbf{I}\_s \ \mathbf{0}\_{s\times(r-l)s} \end{pmatrix}. \tag{8.28}$$

Then

$$d\text{vec}\,\mathbb{B} = \sum\_{l=1}^{r} \left(\mathbf{Q}\_{l}^{\text{T}} \otimes \mathbf{P}\_{l}\right) d\text{vec}\,\mathbf{B}\_{l}.\tag{8.29}$$

Similarly, for M, define matrices **R***<sup>i</sup>* and **S***<sup>i</sup>*

$$\mathbf{R}\_{l} = \begin{pmatrix} \mathbf{0}\_{r(l-1)\times r} \\ \mathbf{I}\_{r} \\ \mathbf{0}\_{r(s-l)\times r} \end{pmatrix} \qquad \mathbf{S}\_{l} = \begin{pmatrix} \mathbf{0}\_{r\times(l-1)r} \ \mathbf{I}\_{r} \ \mathbf{0}\_{r\times(s-l)r} \end{pmatrix}. \tag{8.30}$$

Then

$$d\text{vec}\,\mathbb{M} = \sum\_{l=1}^{s} \left(\mathbf{S}\_{l}^{\mathsf{T}} \otimes \mathbf{R}\_{l}\right) \text{vec}\,\mathbf{M}\_{l}\,. \tag{8.31}$$

Substituting (8.29) and (8.31) into the expression (8.27) for *d*vec **A** gives the final result

$$\frac{d\text{vec}\,\mathbf{A}}{d\boldsymbol{\theta}^{\top}} = \underbrace{\mathbf{X}\_{l}\sum\_{j=1}^{s} \left(\mathbf{S}\_{j}^{\top} \otimes \mathbf{R}\_{j}\right)\frac{d\text{vec}\,\mathbf{M}\_{j}}{d\boldsymbol{\theta}^{\top}}}\_{\text{perturnations of the }\mathbf{M}\_{j}} + \underbrace{\mathbf{X}\_{2}\sum\_{l=1}^{r} \left(\mathbf{Q}\_{l}^{\top} \otimes \mathbf{P}\_{l}\right)\frac{d\text{vec}\,\mathbf{B}\_{l}}{d\boldsymbol{\theta}^{\top}}}\_{\text{perturnations of the }\mathbf{B}\_{l}}\tag{8.32}$$

where **X**<sup>1</sup> and **X**<sup>2</sup> are constant matrices,

$$\mathbf{X}\_{\mathrm{l}} = \left(\mathbb{B}^{\mathsf{T}} \mathbf{K}^{\mathsf{T}} \otimes \mathbf{K}^{\mathsf{T}}\right) \tag{8.33}$$

$$\mathbf{X}\_2 = \left(\mathbf{I}\_{sr} \otimes \mathbf{K}^{\mathsf{T}} \mathbf{M} \mathbf{K}\right) \tag{8.34}$$

that need be calculated only once. Although **X**1, **X**2, and the Kronecker products appearing in the summations are large, they are also extremely sparse. The sparse matrix capabilities in MATLAB can take advantage of this fact. Substituting (8.32) into (8.5) gives the sensitivity of an output variable *ξ* to changes in parameters that perturb any or all of the **M***<sup>j</sup>* and **B***i*.

#### **8.4 Nonlinear Models and Delayed Density Dependence**

Anticipating the more extensive treatment in Chap. 10, we consider the effects of nonlinearity in periodic models. You may want to return to this section after Sect. 10.7, which analyzes periodic oscillations arising from time-invariant nonlinearities. When periodic environmental changes interact with such oscillations, the results can be complicated, and such interactions are the focus of the present section.

In a periodic nonlinear model, each of the **B***<sup>i</sup>* in (8.2) may depend on density. Especially in seasonal models, the vital rates in the matrix **B***<sup>i</sup>* may depend on densities not only at phase *i*, but at previous phases within the cycle as well. For example, in a study of the invasive plant garlic mustard (*Alliaria petiolata*) Shyu et al. (2013) found that seed production of fruiting plants in the fall reflected the density experienced by vegetative rosettes in the early spring.

To develop a model including such delayed density dependence, define

$$\mathbf{n}\_l(t) = \text{population at season } i \text{ in year } t \tag{8.35}$$

Starting at season 1, the dynamics are given by

$$\begin{aligned} \mathbf{n}\_1(t+1) &= \mathbf{B}\_p \mathbf{n}\_p(t) \\ \mathbf{n}\_2(t) &= \mathbf{B}\_1 \mathbf{n}\_1(t) \end{aligned} $$
 
$$\begin{aligned} \vdots \end{aligned} \tag{8.36}$$
 
$$\mathbf{n}\_p(t) = \mathbf{B}\_{p-1} \mathbf{n}\_{p-1}(t) \end{aligned} \tag{8.36}$$

Density-dependence, in a general form, means that the matrices **B***<sup>i</sup>* may be functions of densities over one cycle prior to season *i*:

$$\begin{aligned} \mathbf{B}\_1 &= \mathbf{B}\_1 \left[ \mathbf{n}\_1(t), \mathbf{n}\_p(t-1), \dots, \mathbf{n}\_2(t-1) \right] \\\\ \mathbf{B}\_2 &= \mathbf{B}\_2 \left[ \mathbf{n}\_2(t), \mathbf{n}\_1(t), \mathbf{n}\_p(t-1), \dots, \mathbf{n}\_3(t-1) \right] \\\\ &\vdots \\\\ \mathbf{B}\_p &= \mathbf{B}\_p \left[ \mathbf{n}\_p(t), \mathbf{n}\_{p-1}(t), \dots, \mathbf{n}\_1(t) \right] \end{aligned} \tag{8.37}$$

A fixed point on the interannual time scale, from *t* to *t* + 1, is a *p*-cycle on the seasonal scale, satisfying

$$\begin{aligned} \hat{\mathbf{n}}\_1 &= \mathbf{B}\_p \left[ \hat{\mathbf{n}}\_1, \dots, \hat{\mathbf{n}}\_p \right] \ \hat{\mathbf{n}}\_p \\\\ \hat{\mathbf{n}}\_2 &= \mathbf{B}\_l \left[ \hat{\mathbf{n}}\_1, \dots, \hat{\mathbf{n}}\_p \right] \ \hat{\mathbf{n}}\_l \\\\ &\vdots \\\\ \hat{\mathbf{n}}\_p &= \mathbf{B}\_{p-1} \left[ \hat{\mathbf{n}}\_1, \dots, \hat{\mathbf{n}}\_p \right] \ \hat{\mathbf{n}}\_{p-1} \end{aligned} \tag{8.38}$$

A *k*-cycle on the interannual time scale is a *kp*-cycle on the seasonal time scale, the points of which are numbered **n**ˆ <sup>1</sup>*,...,* **n**ˆ *kp*. The corresponding sequence of matrices, in which the annual cycle **B**1*,...,***B***<sup>p</sup>* is repeated *k* times, is defined as **B**1*,...,***B***kp*. With this notation, (8.38) still holds, with *kp* instead of *p* entries.

Differentiating (8.38) yields

$$d\hat{\mathbf{n}}\_{l} = (d\mathbf{B}\_{l-1})\,\hat{\mathbf{n}}\_{l-1} + \mathbf{B}\_{l-1} \left(d\hat{\mathbf{n}}\_{l-1}\right) \qquad i = 1, \ldots, p \tag{8.39}$$

where the subscripts on **n**ˆ and **B** are interpreted modulo *p*. Applying the vec operator to (8.39) yields

$$d\hat{\mathbf{n}}\_{l} = \left(\hat{\mathbf{n}}\_{l-1}^{\mathsf{T}} \otimes \mathbf{I}\_{s}\right) d\mathsf{vec} \,\mathbf{B}\_{l-1} + \mathbf{B}\_{l-1} d\hat{\mathbf{n}}\_{l-1}.\tag{8.40}$$

The sensitivity analysis of the cycle involves a set of block-structured matrices, the form of which is easily generalized from the special case with *p* = 3. Assuming *p* = 3 and noting that **B** depends on all the **n**ˆ*<sup>i</sup>* as well as on the parameter vector *θ*, the total differential of **B***i*−<sup>1</sup> in (8.40) is

$$d\text{vec}\,\mathbf{B}\_{l-1} = \frac{\partial \text{vec}\,\mathbf{B}\_{l-1}}{\partial \mathbf{n}\_1^\mathbf{I}} d\mathbf{\hat{n}}\_1 + \frac{\partial \text{vec}\,\mathbf{B}\_{l-1}}{\partial \mathbf{n}\_2^\mathbf{I}} d\mathbf{\hat{n}}\_2 + \frac{\partial \text{vec}\,\mathbf{B}\_{l-1}}{\partial \mathbf{n}\_3^\mathbf{I}} d\mathbf{\hat{n}}\_3 + \frac{\partial \text{vec}\,\mathbf{B}\_{l-1}}{\partial \boldsymbol{\theta}^\mathbf{I}} d\boldsymbol{\theta} \tag{8.41}$$

For notational convenience, define the matrices

$$\mathbf{H}\_{l} = \left(\hat{\mathbf{n}}\_{l}^{\mathsf{T}} \otimes \mathbf{I}\_{s}\right) \qquad i = 1, \ldots, p \tag{8.42}$$

Substituting (8.41) into (8.40) produces the set of equations

$$d\hat{\mathbf{n}}\_1 = \mathbf{H}\_3 \frac{\partial \text{vec} \, \mathbf{B}\_3}{\partial \theta^\top} d\theta + \mathbf{H}\_3 \sum\_{j=1}^p \frac{\partial \text{vec} \, \mathbf{B}\_3}{\partial \mathbf{n}\_j^\top} d\hat{\mathbf{n}}\_j + \mathbf{B}\_3 d\hat{\mathbf{n}}\_3$$

$$d\hat{\mathbf{n}}\_2 = \mathbf{H}\_1 \frac{\partial \text{vec} \, \mathbf{B}\_1}{\partial \theta^\top} d\theta + \mathbf{H}\_1 \sum\_{j=1}^p \frac{\partial \text{vec} \, \mathbf{B}\_1}{\partial \mathbf{n}\_j^\top} d\hat{\mathbf{n}}\_j + \mathbf{B}\_1 d\hat{\mathbf{n}}\_1 \tag{8.43}$$

$$d\hat{\mathbf{n}}\_1 = \mathbf{H}\_2 \frac{\partial \text{vec} \, \mathbf{B}\_2}{\partial \theta^\top} d\theta + \mathbf{H}\_2 \sum\_{j=1}^p \frac{\partial \text{vec} \, \mathbf{B}\_2}{\partial \mathbf{n}\_j^\top} d\hat{\mathbf{n}}\_j + \mathbf{B}\_2 d\hat{\mathbf{n}}\_2$$

This set of equations can be reduced to a single equation by collecting all the points on the *kp*-cycle into a single vector. Write an array (of dimension *sp* × *k*)

$$\mathcal{N} = \begin{array}{c|cccc} & \text{yr. 1} & \text{yr. \'n} \\ \hline \text{season 1} & \hat{\mathbf{n}}\_1 & \cdots & \hat{\mathbf{n}}\_1 \\ \vdots & \vdots & \vdots \\ \hline \text{season } p & \hat{\mathbf{n}}\_p & \cdots & \hat{\mathbf{n}}\_p \end{array} \tag{8.44}$$

Then write the vector (of dimension *spk* × 1)

$$\mathbb{N} = \text{vec}\,\mathcal{N}\tag{8.45}$$

In terms of this vector, the set of equations (8.43) can be rewritten

$$\frac{d\mathbb{N}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left[\mathbf{I}\_{skp} - \mathbb{B} - \mathbb{H}\mathbb{C}\right]^{-1} \text{HID}.\tag{8.46}$$

where H and B are the block-circulant matrices

$$\mathbb{H} = \begin{pmatrix} 0 & 0 & \mathbf{H}\_3 \\ \mathbf{H}\_1 & 0 & 0 \\ 0 & \mathbf{H}\_2 & 0 \end{pmatrix} \tag{8.47}$$

$$
\mathbb{B} = \begin{pmatrix} 0 & 0 & \mathbf{B}\_3 \\ \mathbf{B}\_1 & 0 & 0 \\ 0 & \mathbf{B}\_2 & 0 \end{pmatrix}, \tag{8.48}
$$

and C and D are the block matrices

$$\mathbf{C} = \begin{pmatrix} \frac{\partial \text{vec}\,\mathbf{B}\_1}{\partial \mathbf{n}\_1^\top} \dots \frac{\partial \text{vec}\,\mathbf{B}\_1}{\partial \mathbf{n}\_3^\top} \\ \vdots & \ddots & \vdots \\ \frac{\partial \text{vec}\,\mathbf{B}\_3}{\partial \mathbf{n}\_1^\top} \dots \frac{\partial \text{vec}\,\mathbf{B}\_3}{\partial \mathbf{n}\_3^\top} \end{pmatrix} \tag{8.49}$$

$$\mathbb{D} = \begin{pmatrix} \frac{\partial \text{vec} \, \mathbf{B}\_1}{\partial \boldsymbol{\theta}^\top} \\\\ \frac{\partial \text{vec} \, \mathbf{B}\_2}{\partial \boldsymbol{\theta}^\top} \\\\ \frac{\partial \text{vec} \, \mathbf{B}\_3}{\partial \boldsymbol{\theta}^\top} \end{pmatrix}. \tag{8.50}$$

All the derivatives are evaluated at **n**ˆ <sup>1</sup>*,...,* **n**ˆ 3.

#### *8.4.1 Averages*

The vector *d*N*/dθ* <sup>T</sup> created by (8.46) contains the sensitivities of all *s* stages, at each of *p* seasons within the year, for each of the *k* years within the inter-annual *k*-cycle. If this is too much information, one can calculate the sensitivity of averages, or other linear combinations, taken in various ways.

To write these averages, let **b***<sup>m</sup>* be a *m* × 1 vector of weights. For a simple average of *m* quantities, each entry of **b***<sup>m</sup>* is 1*/m*; for a weighted average, the entries of **b***<sup>m</sup>* would be non-negative numbers summing to 1. More generally, **b** may contain arbitrary weights, such as biomass, metabolic rate, economic value, etc. See Chaps. 7 and 10. To calculate averages from N, first apply these vectors to average over rows or columns of N and then apply the vec operator to express the results as averages over N.

**Annual fixed point** If the dynamics are a fixed point on the annual time scale, averages can be calculated over stages (using a vector **b***s*), over seasons (using a vector **b***p*), or both. The *p* × 1 vector of averages over stages is

$$\text{avg. over stages} = \text{vec}\left(\mathbf{b}\_s^\mathsf{T}\boldsymbol{\varLambda}\right) \tag{8.51}$$

$$= \left(\mathbf{I}\_p \otimes \mathbf{b}\_s^\mathsf{I}\right) \mathbb{N} \tag{8.52}$$

The *s* × 1 vector of averages over seasons is

$$\text{avg. over season} = \left(\mathbf{b}\_p^\mathsf{r} \otimes \mathbf{I}\_s\right) \mathbb{N} \tag{8.53}$$

The average over both seasons and stages (a scalar) is

$$\text{avg. over stages and seasons} = \left(\mathbf{b}\_p^\mathsf{T} \otimes \mathbf{b}\_s^\mathsf{T}\right) \mathbb{N} \tag{8.54}$$

Because the average is a linear operator, the sensitivities of these averages are obtained by applying the same weights to the derivative *d*N*/dθ* <sup>T</sup> in (8.46):

$$\text{sensitivity of avg. over stages} = \left(\mathbf{I}\_p \otimes \mathbf{b}\_s^\mathsf{I}\right) \frac{d\mathbb{N}}{d\theta^\mathsf{I}}\tag{8.55}$$

$$\text{sensitivity of avg. over season} = \left(\mathbf{b}\_p^\mathsf{T} \otimes \mathbf{I}\_s\right) \frac{d\mathbb{N}}{d\boldsymbol{\theta}^\mathsf{T}}\tag{8.56}$$

$$\text{sensitivity of avg. over both} = \left(\mathbf{b}\_p^\mathsf{T} \otimes \mathbf{b}\_s^\mathsf{T}\right) \frac{d\mathbb{N}}{d\boldsymbol{\theta}^\mathsf{T}}\tag{8.57}$$

**Annual** *k***-cycle** When the dynamics produce a *k*-cycle on the annual time scale, averages can be calculated over any desired combination of stages, seasons, and years. Table 8.2 gives the resulting expressions for the averages. As in the case of equations (8.55) and (8.56), the sensitivities of these averages to parameters are obtained by applying the same weights to *d*N*/dθ* <sup>T</sup> .

#### *8.4.2 A Nonlinear Example*

As an example of the calculations for nonlinear systems, imagine an organism with two stages: immature juveniles and reproducing adults. Suppose that the year contains two seasons: a benign, reproduction-heavy Season 1 and a harsh, mortalityheavy Season 2. The life cycle graph is shown in Fig. 8.2. Adults in Season 1 all survive to Season 2 and give birth to new juveniles with per-capita fertility *f* , which depends on adult density in Season 1 according to *<sup>f</sup>* [**n**1] = *ae*−*bn*<sup>2</sup> , where *<sup>a</sup>* and *<sup>b</sup>* are the maximum fertility and the strength of density-dependence, respectively, and

**Table 8.2** Calculation of averages of attractors of nonlinear periodic matrix population models. The upper half of the table shows averages over stages and over seasons when the dynamics are a fixed point on the inter-annual time scale, and thus a *p*-cycle on the seasonal time scale. The lower half of the table shows averages over all combinations of stages, seasons, and years, when the dynamics are a *k*-cycle on the inter-annual time scale, and thus a *kp*-cycle on the seasonal time scale


*n*<sup>2</sup> is the adult density in Season 1. In the harsher Season 2, juveniles and adults survive with probabilities *sj* and *sa*. A juvenile that survives to Season 1 matures into an adult.

This life cycle produces seasonal transition matrices **B**<sup>1</sup> and **B**2:

$$\mathbf{B}\_{\rm I}[\mathbf{n}\_{\rm I}] = \begin{pmatrix} 0 \ f[\mathbf{n}\_{\rm I}] \\ 0 \end{pmatrix} \tag{8.58}$$

$$\mathbf{B}\_2 = \begin{pmatrix} 0 & 0\\ s\_f \ s\_a \end{pmatrix} \tag{8.59}$$

**Fig. 8.3** A bifurcation diagram on the seasonal time scale for the two-season, two-stage model of Fig. 8.2. Total densities are plotted for Season 1 (•) and 2 (+). Parameters: *a* = 20, *b* = 1, *sj* = 0*.*5; *sa* varied from 0 to 1

and the nonlinear periodic model

$$
\mathbf{n}\_{\rm l}(t+1) = \mathbf{B}\_2 \mathbf{n}\_2(t)
$$

$$
\mathbf{n}\_2(t) = \mathbf{B}\_{\rm l} \begin{bmatrix} \mathbf{n}\_{\rm l}(t) \end{bmatrix} \mathbf{n}\_{\rm l}(t) \tag{8.60}
$$

Figure 8.3 is a bifurcation diagram for the system (8.60) in response to changes in adult survival *sa*. When adults are long-lived (*sa* 0*.*22) there is a 2-cycle on the seasonal scale, corresponding to a fixed point on the annual time scale, satisfying

$$
\hat{\mathbf{n}}\_1 = \mathbf{B}\_2 \hat{\mathbf{n}}\_2 \tag{8.61}
$$

$$\hat{\mathbf{n}}\_2 = \mathbf{B}\_{\text{l}}[\hat{\mathbf{n}}\_{\text{l}}] \hat{\mathbf{n}}\_{\text{l}} \tag{8.62}$$

At *sa* ≈ 0*.*22 this 2-cycle bifurcates to a 4-cycle on the seasonal time scale, corresponding to a 2-cycle on the annual scale.

To derive the block matrices C and D in Eqs. (8.49) and (8.50), define the parameter vector as

$$
\boldsymbol{\theta} = \begin{pmatrix} \mathbf{s}\_{\boldsymbol{f}} \ \mathbf{s}\_{a} \ a \ \boldsymbol{b} \end{pmatrix}^{\mathsf{T}}.\tag{8.63}
$$

The derivative matrices are

$$\frac{d\mathbf{vec}\,\mathbf{B}\_{\mathrm{I}}}{d\theta^{\mathrm{T}}} = \begin{pmatrix} 0 \, 0 & 0 & 0\\ 0 \, 0 & 0 & 0\\ 0 \, 0 \, e^{-b\hat{\boldsymbol{\eta}}\_{2}} & -a\hat{\boldsymbol{\eta}}\_{2}e^{-b\hat{\boldsymbol{\eta}}\_{2}}\\ 0 \, 0 & 0 & 0 \end{pmatrix} \tag{8.64}$$

$$\frac{d\mathbf{vec}\,\mathbf{B}\_2}{d\theta^\top} = \begin{pmatrix} 0 \ 0 \ 0 \ 0 \\ 1 \ 0 \ 0 \ 0 \\ 0 \ 0 \ 0 \ 0 \\ 0 \ 1 \ 0 \ 0 \end{pmatrix} \tag{8.65}$$

$$\frac{d\mathbf{vec}\,\mathbf{B}\_{\parallel}}{d\mathbf{n}^{\rm r}} = \begin{pmatrix} 0 & 0\\ 0 & 0\\ 0 - abe^{-b\hat{\boldsymbol{\mu}}\_{2}}\\ 0 & 0 \end{pmatrix} \tag{8.66}$$

$$\frac{d\mathbf{vec}\,\mathbf{B}\_2}{d\mathbf{n}^\tau} = \mathbf{0} \qquad (\text{dimension}\,\mathbf{4} \times \mathbf{2})\tag{8.67}$$

We calculate the sensitivities of the equilibrium population at each phase of the cycle using Eq. (8.46) with *sa* = 0*.*4 (a 2-cycle on the seasonal time scale; see Fig. 8.3) and with *sa* = 0*.*1 (a 4-cycle on the seasonal time scale). The results, and the sensitivities of several averages, are shown in Fig. 8.4.

At the seasonal 2-cycle (annual fixed point), increases in *sj* or *sa* increase density in Season 1 and reduce density in Season 2, and have little effect on the

**Fig. 8.4** Sensitivities of equilibrium total population size in Seasons 1 and 2, as well as the annual population average, to the demographic parameters *sj* , *sa*, *a*, and *b*. Left: sensitivities when *sa* = 0.4 (seasonal 2-cycle, annual equilibrium). Right: sensitivities when *sj* = 0.1 (seasonal 4-cycle, annual 2-cycle)

density averaged over seasons. The maximum fertility level *a* has little effect at either season, and the density-dependent parameter *b* has large negative effects throughout.

At the 4-point seasonal cycle (2-cycle on the annual time scale), the patterns are more complicated. We describe them in terms of the *kp* = 4 seasons in the cycle. The maximum fertility *a* has little effect at any point. The survival probabilities *sj* and *sa* have effects that are opposite in sign: an increase in *sj* increases the density in seasons 1 and 4, and reduces it in seasons 2 and 3. An increase in *sa* has the opposite effect. Averaged over years, both *sa* and *sj* increase density in season 1 and reduce it in season 2, thus increasing the amplitude of the oscillation. Averaged over seasons, *sa* and *sj* have opposite effects. When averaged over stages, seasons, and years, the effects of *sa* cancel each other out, and only *sj* and *b* have appreciable effects.

Even in this simple example, it is clear that parameter changes can have effects that differ among seasons and years. A set of MATLAB scripts to carry out these calculations appears in an online supplement to Caswell and Shyu (2012).

#### **8.5 LTRE Decomposition Analysis**

The LTRE decomposition analysis introduced in Sects. 2.9 and 4.5 can be extended to obtain the contributions, to any given outcome, of differences in parameters at each phase of the cycle.

Suppose that *ξ* is a *m* × 1 dependent variable (scalar or vector-valued), a function of a parameter vector *θ* that takes on values *θ* <sup>1</sup>*,..., θ<sup>p</sup>* over the cycle. Use superscripts to denote two conditions,3 which produce results *ξ (*1*)* and *ξ (*2*)* :

$$
\theta\_1^{(1)}, \dots, \theta\_p^{(1)} \to \xi^{(1)}\tag{8.68}
$$

$$\boldsymbol{\theta}\_1^{(2)}, \dots, \boldsymbol{\theta}\_p^{(2)} \to \boldsymbol{\xi}^{(2)}\tag{8.69}$$

To first order, the effect on *ξ* is

$$\left(\boldsymbol{\xi}^{(2)} - \boldsymbol{\xi}^{(1)}\right) \approx \sum\_{k=1}^{p} \frac{d\boldsymbol{\xi}}{d\theta\_k^{\sf T}} \left(\boldsymbol{\theta}\_k^{(2)} - \boldsymbol{\theta}\_k^{(1)}\right) \tag{8.70}$$

The *k*th term in the summation in (8.70) is the total contribution, over all of the parameters in *θ*, of parameter differences in phase *k* of the cycle.

Define **R***<sup>k</sup>* as a *m* × *p* contribution matrix whose entries are the contributions of parameter *θj* in phase *k* to the effects on outcome variable *ξi*. Then

$$\mathbf{R}\_k = \frac{d\boldsymbol{\xi}}{d\boldsymbol{\theta}\_k^{\mathsf{T}}} \mathcal{D}\left(\boldsymbol{\theta}\_k^{(2)} - \boldsymbol{\theta}\_k^{(1)}\right) \tag{8.71}$$

<sup>3</sup>The extension to more than two conditions is easy; see Caswell (2001).

where the derivative is evaluated at the average of *θ(*1*)* and *θ(*2*)* . These contributions are a decomposition of the approximate effect in (8.70),

$$
\xi^{(2)} - \xi^{(1)} \approx \sum\_{k=1}^{p} \mathbf{R}\_k \mathbf{1}\_p \tag{8.72}
$$

The contribution matrix (8.71) requires *dξ/dθ* <sup>T</sup> *<sup>k</sup>*, that is, the derivative of *ξ* to the parameter at phase *k* of the cycle. In the linear model (8.2), this is given by the *k*th term in the summation in (8.11). In the case of the nonlinear model (8.36), the derivative is obtained from Eq. (8.46) by setting all blocks of D, except those corresponding to phase *k*, to zero.

#### **8.6 Discussion**

The distinguishing feature of periodic models is that the dynamics over a projection interval are given by a periodic product of matrices. The periodic product may reflect the existence of multiple timescales (e.g., seasonal and annual), or the operation of multiple processes (e.g., demography and harvest), or express conditional probabilities, or arise from classifying individuals by multiple criteria. The sensitivity analysis of periodic models must account for the chain of causation (Fig. 8.1) from demographic parameters at each phase in the cycle to the corresponding projection matrices, and thence to the periodic matrix product over the whole cycle, and finally to demographic outcome *ξ* . Matrix calculus makes this easy to do, starting with a simple chain rule expression (see Eq. (8.5)) and then using an appropriate version of (8.9) to calculate the derivative *d*vec **A***/dθ* <sup>T</sup> .

#### **Bibliography**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 9 LTRE Decomposition of the Stochastic Growth Rate**

#### **9.1 Introduction**

The basic unit of comparative demography is a study that reports the value of some demographic outcome in two populations that differ in a set of vital rates. One challenge of such studies is to account for the difference in outcomes by decomposing that difference into contributions from differences in each of the parameters. It frequently happens that small differences in some parameters make large contributions to the difference in outcomes, and vice-versa.

In some parts of the literature, such studies are called life table response experiment (or LTRE) analyses; versions of this analysis have appeared in Sect. 1.3.1 and Chaps. 2, 4, and 8. The term was introduced by in the context of laboratory studies of the population effects of pollutants, hence the use of the word "experiment" (Caswell 1989). The conditions among which the populations are compared will be called "treatments" here, but there is no restriction to experimental manipulations.

Similar decomposition analyses have been developed independently in ecology and human demography. For example, Pollard's (1988) study of life expectancy used methods very similar to LTRE analyses of the population growth rate. Horiuchi et al. (2008) developed a method for continuous variables that is essentially identical to that used by ecologists for regression LTRE calculations (Caswell 1996). Canudas Romo (2003) reviews the human demographic literature.

This chapter uses matrix calculus to extend LTRE analysis to stochastic models, by showing how to decompose differences in the stochastic growth rate, log *λs*. Because stochastic models include both environmental fluctuations and the vital rate responses to those fluctuations, their structure is richer than that of time-

Chapter 9 is modified from: Caswell, H. 2010. Life table response experiment analysis of the stochastic growth rate. Journal of Ecology 98:324–333. ©Hal Caswell.

H. Caswell, *Sensitivity Analysis: Matrix Methods in Demography and Ecology*, Demographic Research Monographs, https://doi.org/10.1007/978-3-030-10534-1\_9

invariant models. Stochastic LTRE analysis thus requires a new approach to decomposing these differences. The payoffs, in terms of demographic and biological understanding, are great.

#### **9.2 Decomposition with Derivatives**

The familiar LTRE analysis uses derivatives to approximate the contributions of the vital rates to some (vector-valued) outcome *ξ* (dimension *q* × 1), as described in Chap. 2. Suppose that *ξ* depends on a vector *θ* of vital rates (dimension *p* × 1), and that observations are available under two treatments, with

$$\theta^{(\mathbb{I})} \longrightarrow \mathfrak{F}^{(\mathbb{I})} \tag{9.1}$$

$$
\mathfrak{G}^{(2)} \longrightarrow \mathfrak{F}^{(2)}.\tag{9.2}
$$

Using matrix calculus notation, to first order,

$$
\xi^{(2)} - \xi^{(1)} \approx \frac{d\xi}{d\theta^{\dagger}} \left( \theta^{(2)} - \theta^{(1)} \right). \tag{9.3}
$$

where the derivative of *ξ* is evaluated at the mean of the two parameter vectors.

All the contributions to the difference *<sup>ξ</sup> (*2*)* <sup>−</sup> *<sup>ξ</sup> (*1*)* are contained in a matrix **<sup>C</sup>** (dimension *q* × *p*) given by

$$\mathbf{C} = \frac{d\xi}{d\theta^\dagger} \mathcal{D} \left( \theta^{(2)} - \theta^{(1)} \right) \tag{9.4}$$

where the derivative is evaluated at the mean of *θ(*1*)* and *θ(*2*)* .

The entry **C***(i, j )* of the contribution matrix is the contribution of the difference *θj* to the difference in *ξi*. The columns and rows of **C** give

$$\mathbf{C}(:,j) = \text{contribution of } \Delta\theta\_j \text{ to } \Delta\xi \tag{9.5}$$

$$\mathbf{C}(i,:) = \text{contribution of } \Delta\theta \text{ to } \Delta\xi\_{i}. \tag{9.6}$$

The sum over rows of **C** is the approximation (9.3) to the treatment effect on *ξ*

$$
\mathfrak{k}^{(2)} - \mathfrak{k}^{(1)} \approx \mathbf{C} \mathbf{1}\_p. \tag{9.7}
$$

The accuracy of this approximation gives a measure of the adequacy of the firstorder assumption. Contributions can be small either because the treatment has little effect on *θi* or because *ξ* does not respond much to changes in *θi*.

The contribution matrix **C** takes advantage of matrix calculus to provide a simple calculation for decomposition of scalar-, vector- or matrix-valued differences. Studies including more than two treatments or conditions are analyzed by defining a reference parameter vector *θ<sup>r</sup>* and calculating a matrix **C***<sup>i</sup>* for treatment *i* in terms of the parameter difference *θ<sup>i</sup>* − *θr*. The reference treatment might be the average parameter set, or the parameters for a "control" condition, etc.

#### **9.3 Kitagawa and Keyfitz: Decomposition Without Derivatives**

In decomposing differences in the stochastic growth rate, we encounter variables for which the derivatives in (9.3) cannot be calculated. Fortunately, an alternative method for decomposition is available that does not rely on derivatives. It was introduced by Kitagawa (1955) to explore the effects of age-specific death rates and of age distribution on crude death rates. The method was later extended by Keyfitz to decompose differences in age distributions, dependency ratios, and population growth rates into contributions from the entire mortality and fertility schedules (Keyfitz 1968, Section 7.4; Keyfitz and Caswell 2005, Section 10.1). Canudas Romo (2003) summarizes more recent extensions of the approach in demography.

Suppose that *ξ* depends on two variables, with values *(a, b)* in Treatment 1 and *(A, B)* in Treatment 2. Thus

$$
\xi^{(1)} = \xi[a, b] \tag{9.8}
$$

$$
\xi^{(2)} = \xi[A, B]. \tag{9.9}
$$

To decompose the treatment effect *ξ* [*A, B*] − *ξ* [*a, b*] into contributions from *A* − *a* and *B* −*b*, the Kitagawa-Keyfitz method proceeds by exchanging variables between the two treatments and calculating *ξ* for all possible combinations. The effect of *A* − *a*, against the background of *B*, is *ξ* [*A, B*] − *ξ* [*a,B*]. The effect of *A* − *a*, against the background of *b* is *ξ* [*A, b*] − *ξ* [*a, b*]. The overall contribution of *A* − *a* is obtained by averaging its effect against the two backgrounds:

$$C(A - a) = \mathbb{I}/2\left(\xi[A, B] - \xi[a, B]\right)$$

$$+ \mathbb{I}/2\left(\xi[A, b] - \xi[a, b]\right). \tag{9.10}$$

Similarly, the contribution of *B* − *b* is

$$C(B-b) = \mathbb{I}/2\left(\xi[A,B] - \xi[A,b]\right)$$

$$+ \mathbb{I}/2\left(\xi[a,B] - \xi[a,b]\right). \tag{9.11}$$

If this appears familiar, it may be because this process of averaging differences across different backgrounds is precisely analogous to the calculation of main effects in a two-way ANOVA (e.g., Steel and Torrie 1960, Section 11.2).

#### **9.4 Stochastic Population Growth**

A stochastic model contains two components: a model for the dynamics of the environment and a model for the response of the vital rates to the environment (Cohen 1979; Tuljapurkar 1990; Caswell 2001). I focus here on the stochastic population growth rate, log *λs*. Consider a population growing according to

$$\mathbf{n}(t+1) = \mathbf{A}(t)\mathbf{n}(t) \tag{9.12}$$

where the projection matrix **A***(t)* is generated by a realization of an ergodic stochastic environment that produces, for every environmental state, a set of vital rates that satisfy certain regularity conditions. Then, the asymptotic long-term growth rate is, with probability one,

$$\log \lambda\_s = \lim\_{T \to \infty} \frac{1}{T} \log \left\| \mathbf{A}(T-1) \cdots \mathbf{A}(0)\mathbf{n}\_0 \right\| \tag{9.13}$$

(Cohen 1976; Tuljapurkar and Orzack 1980; Tuljapurkar 1990). This growth rate plays a central role in demography and biodemography in stochastic environments, exactly analogous to the role played by the population growth rate *λ* or *r* = log *λ* in stable population theory in constant environments. Cohen (1986) and Lee and Tuljapurkar (1994) have incorporated models of the form (9.12), with the addition of immigration terms, into the context of human population projections, to provide estimates of confidence intervals more rigorous than the "high, medium, low" scenarios usually reported.

The additional component in stochastic environments adds an extra layer of complexity to the LTRE decomposition of the stochastic growth rate (Fig. 9.1). The differences in log *λs* between two treatments is partly due to differences in the environmental dynamics and partly to differences in the vital rates within each environmental state.

In this chapter, I consider the case in which the environment is described by a finite-state Markov chain. Ecological examples include years with our without fire (Silva et al. 1991), years since fire (Caswell and Kaye 2001), years with early or late floods, or with high or low precipitation (Smith et al. 2005) and years with good or poor sea ice conditions (Hunter et al. 2010; Jenouvrier et al. 2009b). The Markovian environment case also includes the situation where the environment is modelled implicitly by selecting randomly from a set of empirically-measured matrices (e.g., Bierzychudek 1982; Cohen et al. 1983; Jenouvrier et al. 2009a). Let *u(t)* be the state of the environment at time *t*. The environmental dynamics are determined by the Markov chain transition matrix **P**, where *pij* = *P* [*u(t* + 1*)* = *i*|*u(t)* = *j* ].

**Fig. 9.1** The determination of population growth rate in (**a**) time-invariant and (**b**) stochastic models. The deterministic growth rate *λ* is defined by a set of vital rates, which are determined by the environment ("treatment"). The stochastic growth rate log *λs* requires an additional model for the stochastic dynamics of the environment and a function giving the response of the vital rates to the state of the environment

The second part of the model is the response of the vital rates to the environment. Let *θ* be a vector of parameters that determine the projection matrix **A**. The vectors *θ* <sup>1</sup>*,..., θ<sup>K</sup>* correspond to environmental states 1*,...,K*. I will write the entire set of vital rates as

$$\Theta = \{\theta\_1, \dots, \theta\_K\}\,. \tag{9.14}$$

We write **A***(t)* = **A**[*θ(t)*], and the stochastic growth rate (9.13) becomes

$$\log \lambda\_s \left[ \mathbf{P}, \Theta \right] = \lim\_{T \to \infty} \frac{1}{T} \log \left\| \mathbf{A}[\theta(T-1)] \cdots \mathbf{A}[\theta(0)] \mathbf{n}\_0 \right\| \tag{9.15}$$

where *θ(t)* is the parameter vector created by the environmental state *u(t)*. I have written log *λs* as an explicit function of **P** and to emphasize that it depends on both the environment and the vital rate response.

#### *9.4.1 Environment-Specific Sensitivities*

The sensitivity of log *λs* to the vital rates was given by Tuljapurkar (1990). For the LTRE analysis, we require the derivatives of log *λs* with respect to the parameters in each state of the environment; i.e., to each of the vectors *θ<sup>i</sup>* in . These environment-specific sensitivities were given by Caswell (2005) and independently by Horvitz et al. (2005), and have been applied by Gervais et al. (2006), Aberg et al. (2009), and Svensson et al. (2009). Rewriting Tuljapurkar's (1990) formula in matrix calculus notation yields the derivative of log *λs* with respect to the vital rate vector in environment *i*:

$$\frac{d\log\lambda\_s}{d\theta^\mathsf{T}}\bigg|\_{u=l} = \lim\_{T\to\infty} \frac{1}{T} \sum\_{t=0}^{T-1} J\_l \frac{\left[\mathbf{w}(t)^\mathsf{T}\otimes\mathbf{v}(t+1)^\mathsf{T}\right]}{R\_l\mathbf{v}^\mathsf{r}(t+1)\mathbf{w}(t+1)} \frac{d\mathbf{v}\,\mathbf{a}[\theta(t)]}{d\theta^\mathsf{T}}.\tag{9.16}$$

This is the stochastic analogue of the expression (3.46) in Chap. 3, for the sensitivity of the deterministic growth rate. The vectors **w***(t)* and **v***(t)* are the stochastic analogues of the right and left eigenvectors of a deterministic model, and *Rt* is the growth of total population size from *t* to *t* +1. See Caswell (2001, Section 14.4) for a step-by-step algorithm for the calculation.

To make sensitivity environment-dependent, *Jt* is an indicator variable, defined as

$$J\_l = \begin{cases} 1 \text{ if } \mu(t) = i \\ 0 \text{ otherwise} \end{cases} \tag{9.17}$$

If the parameters *θ* consist of the elements of **A**, then *d*vec **A***/dθ* <sup>T</sup> = **I**, where **I** is the identity matrix. If *θ* contains lower-level parameters, then *d*vec **A***/dθ* <sup>T</sup> contains the derivatives of **A** with respect to these parameters.

#### **9.5 LTRE Decomposition Analysis for log** *λs*

Suppose now that we have two treatments, and want to decompose the difference,

$$\log \lambda\_s^{(2)} - \log \lambda\_s^{(1)} = \log \lambda\_s \left[ \mathbf{P}^{(2)}, \mathbf{G}^{(2)} \right] - \log \lambda\_s \left[ \mathbf{P}^{(1)}, \mathbf{G}^{(1)} \right] \tag{9.18}$$

into contributions. This difference compares growth in treatment 2 to growth in treatment 1. Treatment 1, the reference treatment, could be a control in a manipulative experiment, or some other specific condition of interest (as in the example to be considered below), or an average over treatments in a factorial experiment.

The treatment effect on log *λs* in (9.18) depends on both the differences in environmental dynamics (captured in the transition matrices **P***(*1*)* and **P***(*2*)* ) and the differences in the vital rate responses (captured in the parameter arrays *(*1*)* and *(*2*)* ). Because log *λs* is calculated numerically from (9.15) by simulation, it cannot be differentiated1 with respect to **P**, so we will use the Kitagawa-Keyfitz

<sup>1</sup>Well, not by me. But see Steinsaltz et al. (2011) for a rigorous development of the sensitivity analysis of stochastic growth rates that includes the effects of changes in the entries of **P**.

decomposition for the environmental dynamics contribution, and environmentspecific derivatives (9.16) for the vital rate response contributions

Let us consider three cases: the case where only the vital rate responses differ, the case where only the environmental dynamics differ, and finally the case where both differ.

#### *9.5.1 Case 1: Vital Rates Differ, Environments Identical*

Consider two treatments that affect the vital rate responses but not the environmental dynamics. For example, one might want to compare low and high fertility sites subjected to a common fire frequency. The transition matrix **P** is identical in the two sites, but the vital rates differ. The stochastic growth rates are

$$\log \lambda\_s^{(1)} = \log \lambda\_s \left[ \mathbf{P}, \Theta^{(1)} \right] \tag{9.19}$$

$$\log \lambda\_s^{(2)} = \log \lambda\_s \left[ \mathbf{P}, \Theta^{(2)} \right]. \tag{9.20}$$

The difference in log *λs* is composed of contributions from vital rate differences in each state of the environment. To first order,

$$\log \lambda\_s^{(2)} - \log \lambda\_s^{(1)} \approx \sum\_{l=1}^{K} \left( \frac{\partial \log \lambda\_s}{\partial \theta^{\sf T}} \bigg|\_{u=l} \right) \left( \theta\_l^{(2)} - \theta\_l^{(1)} \right) \tag{9.21}$$

where the derivatives are environment-specific sensitivities (9.16), and are evaluated at the mean of *(*1*)* and *(*2*)* . The *i*th term of the summation in (9.21) is the contribution of differences in the *i*th environment. These can be written as the elements of a contribution matrix (dimension 1 × *p*)

$$\mathbf{C}(\boldsymbol{\theta}\_{l}) = \left. \frac{\partial \log \lambda\_{s}}{\partial \boldsymbol{\theta}^{\mathsf{T}}} \right|\_{\boldsymbol{u} = \boldsymbol{l}} \mathcal{D} \left( \boldsymbol{\theta}\_{l}^{(2)} - \boldsymbol{\theta}\_{l}^{(1)} \right) \quad i = 1, \ldots, K. \tag{9.22}$$

#### *9.5.2 Case 2: Vital Rates Identical, Environments Differ*

Now consider two treatments that affect the environmental dynamics (given by **P***(*1*)* and **P***(*2*)* ) but not the vital rate responses. For example, a comparison of population growth before and after implementing a fire control strategy that changes the frequency of fire, but has no effect on how the vital rates respond to fire. The stochastic growth rates are

$$\log \lambda\_s^{(1)} = \log \lambda\_s \left[ \mathbf{P}^{(1)}, \Theta \right] \tag{9.23}$$

$$\log \lambda\_s^{(2)} = \log \lambda\_s \left[ \mathbf{P}^{(2)}, \Theta \right]. \tag{9.24}$$

The matrices **P***(*1*)* and **P***(*2*)* may differ in their long-term frequencies of environmental states. Those long-term frequencies are given by the stationary distributions, i.e., the right eigenvector *π* corresponding to the dominant eigenvalue of **P** (which always equals 1), scaled so that *π* sums to 1. The same frequency of environmental states, however, can be obtained from processes with different autocorrelation patterns, from negative autocorrelation (where states tend to alternate) to positive autocorrelation (characterized by long runs of the same state; see Caswell and Kaye (2001, Fig. 2) for an example). So, **P***(*1*)* and **P***(*2*)* may differ in their stationary distributions, autocorrelation patterns, or both. To separate the contributions from these, using the Kitagawa-Keyfitz decomposition, we construct a Markov chain with the same stationary distribution *π* as **P**, but in which successive environmental states are independent, and hence there is no autocorrelation. This chain has the transition matrix

$$\mathbf{Q} = \pi \mathbf{1}^{\mathsf{T}} \tag{9.25}$$

where **1** is a vector of ones. Because the next state is independent of the previous state, and the same matrix is applied at each time, this process is called "independent and identically distributed," and abbreviated "iid."

The contribution to log *<sup>λ</sup>(*2*) <sup>s</sup>* <sup>−</sup> log *<sup>λ</sup>(*1*) <sup>s</sup>* of differences in **<sup>P</sup>** is

$$C(\mathbf{P}) = \log \lambda\_s \left[ \mathbf{P}^{(2)}, \Theta \right] - \log \lambda\_s \left[ \mathbf{P}^{(1)}, \Theta \right]. \tag{9.26}$$

The contribution of the difference in the iid part of the environment is

$$C(\mathbf{Q}) = \log \lambda\_s \left[ \mathbf{Q}^{(2)}, \mathbf{G} \right] - \log \lambda\_s \left[ \mathbf{Q}^{(1)}, \mathbf{G} \right]. \tag{9.27}$$

The contribution of differences in environmental autocorrelation, denoted by *C(***R***)*, is obtained by subtraction;

$$C(\mathbf{R}) = C(\mathbf{P}) - C(\mathbf{Q}).\tag{9.28}$$

#### *9.5.3 Case 3: Vital Rates and Environments Differ*

Finally, consider two treatments that differ in both the environmental dynamics (**P***(*1*)* and **P***(*2*)* ) and the vital rate responses (*(*1*)* and *(*2*)* ). The stochastic growth rates are

$$\log \lambda\_s^{(\rm l)} = \log \lambda\_s \left[ \mathbf{P}^{(\rm l)}, \Theta^{(\rm l)} \right] \tag{9.29}$$

$$\log \lambda\_s^{(2)} = \log \lambda\_s \left[ \mathbf{P}^{(2)}, \mathbf{O}^{(2)} \right]. \tag{9.30}$$

Our goal is to decompose log *<sup>λ</sup>(*2*) <sup>s</sup>* <sup>−</sup> log *<sup>λ</sup>(*1*) <sup>s</sup>* into contributions from the differences in the stationary environmental frequencies (*C(***Q***)*), in the autocorrelation pattern (*C(***R***)*), and in the vital rates in each environmental state (*C(θ* <sup>1</sup>*), . . . , C(θK)*). The decomposition analysis proceeds in three steps.

1. Write the contributions of the environmental differences using the Kitagawa-Keyfitz method

$$\begin{aligned} C(\mathbf{P}) &= \frac{1}{2} \left( \log \lambda\_s \left[ \mathbf{P}^{(2)}, \,\Theta^{(2)} \right] - \log \lambda\_s \left[ \mathbf{P}^{(1)}, \,\Theta^{(2)} \right] \right. \\\\ &\left. + \log \lambda\_s \left[ \mathbf{P}^{(2)}, \,\Theta^{(1)} \right] - \log \lambda\_s \left[ \mathbf{P}^{(1)}, \,\Theta^{(1)} \right] \right) \\\\ C(\mathbf{Q}) &= \frac{1}{2} \left( \log \lambda\_s \left[ \mathbf{Q}^{(2)}, \,\Theta^{(2)} \right] - \log \lambda\_s \left[ \mathbf{Q}^{(1)}, \,\Theta^{(2)} \right] \right. \\\\ &\left. + \log \lambda\_s \left[ \mathbf{Q}^{(2)}, \,\Theta^{(1)} \right] - \log \lambda\_s \left[ \mathbf{Q}^{(1)}, \,\Theta^{(1)} \right] \right) \end{aligned} (9.32)$$

$$C(\mathbf{R}) = C(\mathbf{P}) - C(\mathbf{Q}) \tag{9.33}$$

Each of *C(***P***)*, *C(***Q***)*, and*C(***R***)* is a scalar.

2. Write the contributions of the vital rate differences using the Kitagawa-Keyfitz method

$$C(\Theta) = \frac{1}{2} \left\{ \log \lambda\_s \left[ \mathbf{P}^{(2)}, \Theta^{(2)} \right] - \log \lambda\_s \left[ \mathbf{P}^{(2)}, \Theta^{(1)} \right] \right\}$$

$$+ \frac{1}{2} \left\{ \log \lambda\_s \left[ \mathbf{P}^{(1)}, \Theta^{(2)} \right] - \log \lambda\_s \left[ \mathbf{P}^{(1)}, \Theta^{(1)} \right] \right\} \tag{9.34}$$

*C()* is a scalar, summing the effects of differences in all of the parameter responses at all states of the environment. It is decomposed further in the next step:

3. Use the environment-specific derivatives of log *λs* to decompose each term in (9.34) into contributions from the vital rates in each environment, using (9.22)

$$\begin{split} \mathbf{C} \left( \boldsymbol{\theta}\_{l} \right) &= \frac{1}{2} \left( \frac{\partial \log \lambda\_{s} \left[ \mathbf{P}^{(2)}, \bar{\mathbf{O}} \right]}{\partial \boldsymbol{\theta}^{\mathsf{T}}} \bigg|\_{\boldsymbol{u} = \boldsymbol{i}} \right) \mathcal{D} \left( \boldsymbol{\theta}\_{l}^{(2)} - \boldsymbol{\theta}\_{l}^{(1)} \right) \\ &+ \frac{1}{2} \left( \frac{\partial \log \lambda\_{s} \left[ \mathbf{P}^{(1)}, \bar{\mathbf{O}} \right]}{\partial \boldsymbol{\theta}^{\mathsf{T}}} \bigg|\_{\boldsymbol{u} = \boldsymbol{i}} \right) \mathcal{D} \left( \boldsymbol{\theta}\_{l}^{(2)} - \boldsymbol{\theta}\_{l}^{(1)} \right) \quad i = 1, \ldots, K \end{split} \tag{9.35}$$

for *i* = 1*,...,K*, with the derivatives evaluated at ¯ , the mean of the vital rates in the two treatments being compared. The matrix **C***(θi)* is *(*1×*p)* vector, whose entries give the contributions to the differences in log *λs* from each of the vital rates in environment *i*.

The total contribution of the parameter differences given in (9.34) is

$$C(\boldsymbol{\Theta}) = \sum\_{l=1}^{K} \mathbf{C}(\boldsymbol{\theta}\_{l}) \ \mathbf{1}\_{p}. \tag{9.36}$$

These calculations are easily implemented by writing subroutines to calculate log *λs* and the environment-specific sensitivities given a transition matrix and a set of parameters. The accuracy of the approximations involved can be checked by comparing

$$\log \lambda\_s^{(2)} - \log \lambda\_s^{(2)} \stackrel{?}{\approx} C(\mathbf{Q}) + C(\mathbf{R}) + \sum\_{l=1}^{K} \mathbf{C}(\theta\_l) \ \mathbf{1}\_p. \tag{9.37}$$

#### **9.6 An Example: Fire and an Endangered Plant**

I know of no comparative studies of stochastic population growth that include differences in both the environmental dynamics and the vital rate responses, so here is an artificial example, based on a model for an endangered plant, *Lomatium bradshawii*, in a stochastic fire environment (Caswell and Kaye 2001). *L. bradshawii* (Apiaceae) is a polycarpic herbaceous perennial plant. It exists in only a few isolated populations in prairies of Oregon and Washington. These habitats were, until recent times, subject to natural and anthropogenic fires, to which *L. bradshawii* seems to have adapted. Fires increase plant size and seedling recruitment, but the effect fades within a few years. Populations in recently burned areas have higher growth rates and lower probabilities of extinction than unburned populations. For more information, see Pendergrass et al. (1999), Caswell and Kaye (2001), and Kaye et al. (2001).

A stochastic demographic model for *L. bradshawii* was developed by Caswell and Kaye (2001), based on data from an experimental burning study. Individuals were classified into six stages based on size and reproductive status: yearlings, small and large vegetative plants, and small, medium, and large reproductive plants. The environment was classified into four states defined by the time since the most recent fire: the year of a fire and 1, 2, and 3+ years post-fire, and vital rates were estimated in each of these environmental states. The matrices are given in Caswell and Kaye (2001).

Populations were studied in two sites: Fisher Butte (FB) and Rose Prairie (RP) in western Oregon. The two sites differed in quality for *L. bradshawii*, with RP


superior to FB. Population growth rates were generally higher at RP than at FP (Table 9.1), and the stochastic growth rate was higher in RP than FB at any fire frequency. The critical fire frequency required to maintain *L. bradshawii* populations was about 0.8–0.9 at FB, but only 0.4–0.5 at RP. The causes of the differences between the sites are not known (Pendergrass et al. 1999).

#### *9.6.1 The Stochastic Fire Environment*

The model for environmental dynamics is a two-state Markov chain for fires (each year is either fire or no fire). This generates a four-state Markov chain for the environmental states (0, 1, 2, and 3 or more years post-fire). Let *f* be the long-term frequency of fire, and *ρ* the temporal autocorrelation coefficient of the fire process (the magnitude of *ρ* determines the rate of decay of correlation as time increases, the sign of *ρ* determines whether the correlation is of one sign, or oscillates). In the two-state fire model, the probability of fire in year *t* + 1 if there was no fire in year *t* is *q* = *f (*1 − *ρ)*. The probability of a fire if there was a fire in year *t* is *p* = *q* + *ρ* (see Caswell 2001, Section 14.1). The resulting transition matrix for the four environmental states is

$$\mathbf{P} = \begin{pmatrix} p & q & q & q \\ 1-p & 0 & 0 & 0 \\ 0 & 1-q & 0 & 0 \\ 0 & 0 & 1-q & 1-q \end{pmatrix} . \tag{9.38}$$

If *ρ <* 0, *f* must satisfy

$$\frac{-\rho}{1-\rho} \le |f| \le \frac{1}{1-\rho} \tag{9.39}$$

in order to keep probabilities bounded between 0 and 1. See Caswell and Kaye (2001). Note that even if the fire process is iid, so that *ρ* = 0, the environmental process given by (9.38) is not iid.

#### *9.6.2 LTRE Analysis*

There is no information on differences in fire dynamics at the two sites, so Caswell and Kaye (2001) studied the response of log *λs* to the frequency and autocorrelation of fires. Here, we use stochastic LTRE analysis to decompose the differences in log *λs* in three hypothetical scenarios of environmental differences. I will use the matrix entries as the vital rates *θ*, there being no natural lower-level parameterization in this model. MATLAB code for the calculations is available as an appendix to Caswell (2010).

The stochastic growth rate log *λs* increases with fire frequency for both species. The RP site has a growth advantage, with log *λ(RP ) <sup>s</sup> >* log *λ(FB) <sup>s</sup>* at all fire frequencies. The RP advantage, measured by log *λ(RP ) <sup>s</sup>* <sup>−</sup> log *<sup>λ</sup>(RP ) <sup>s</sup>* increases from ≈0*.*02 when *f* = 0 to ≈0*.*13 when *f* = 1.

**Differences in vital rates and environmental transitions (Case 3)** Suppose that the two sites differ in both environmental dynamics and vital rate responses, with fire frequencies, autocorrelations, and resulting stochastic growth of


In this hypothetical scenario, the FB population tends to experience alternating years with and without fires; in RP, there is a tendency for long runs of years with and without fires. For additional scenarios, see Caswell (2010).

To decompose the treatment effect log *λ(RP ) <sup>s</sup>* <sup>−</sup> log *<sup>λ</sup>(FB) <sup>s</sup>* , we construct the Markov chain transition matrices from (9.38), and calculate the stationary distributions *π(RP )* and *π(FB)* as eigenvectors of **P**. For each site, we generate the iid transition matrix **Q** from (9.25), and compute the contributions *C(***P***)* from (9.31), *C(***Q***)* from (9.32), and *C(***R***)* from (9.33). Then we compute the environmentspecific sensitivities of log *λs* from (9.16), for both **P***(RP )* and **P***(FB)*, and use these to calculate the contributions *C(θi)* of the vital rates in each environmental state, using (9.35). Finally, we sum the *C(θi)* to obtain the integrated effect of all vital rate differences in each environment.

Figure 9.2 shows these contributions. Most of the growth rate advantage of the RP site can be attributed to an RP advantage in **A**[*θ* <sup>1</sup>] and **A**[*θ* <sup>2</sup>] (the year of a fire and the year immediately following a fire). The difference in the long-term frequency of environmental states, and the differences in autocorrelation patterns, make relatively little contribution.

**Fig. 9.2** The contributions of the iid component of the environment (**Q**), the autocorrelated component of the environment (**R**), and the projection matrix entries in each environmental state (**A**1*,...* **A**4) to the difference in the stochastic growth rate log *λs* between the Rose Prairie (RP) and Fisher Butte (FB) populations of *Lomatium bradshawii*. Calculations assume fire frequencies of 0.5 for FB and 0.7 for RP, and autocorrelations *ρ* = −0*.*5 for FB and *ρ* = 0*.*5 for RP

The accuracy of the approximations involved in the LTRE analysis is good. The sum of the contributions in Fig. 9.2 is 0*.*1192, while the actual difference in log *λs* is 0*.*1219 (an accuracy of 98%).

Alternatively, suppose that some kind of fire prevention program in the RP site reduced the fire frequency to *f* = 0*.*1 (well below the critical threshold for persistence), but a fire management program increased the fire frequency in the FP site to *f* = 0*.*9.


Now log *λ(FB) <sup>s</sup> >* log *λ(RP ) <sup>s</sup>* , despite the general advantage in vital rates of RP over FB in most environmental states. Figure 9.3 presents the contributions to log *λs* from differences in fire frequency, autocorrelation, and vital rates, and shows how the contributions of the vital rate differences are, in this case, overwhelmed by the RP disadvantage due to the stationary distribution of the environment.

The sum of the contributions in Fig. 9.3 is −0*.*1326, while the actual difference in log *λs* is −0*.*1395 (an accuracy of 95%, even with a very large difference in growth rate).

**Fig. 9.3** The contributions of the iid component of the environment (**Q**), the autocorrelated component of the environment (**R**), and the matrix entries in each environmental state (**A**1*,...* **A**4) to the difference in the stochastic growth rate log *λs* between the Rose Prairie (RP) and Fisher Butte (FB) populations of *Lomatium bradshawii*. Calculations assume fire frequencies of 0.9 for FB and 0.1 for RP, and autocorrelation *ρ* = 0 for both populations

#### **9.7 Discussion**

This application of matrix calculus provides a general framework for decomposition analysis of the stochastic growth rate in Markovian environments. It is a direct generalization of the familiar LTRE approaches for time-invariant and periodic models, but combined with the powerful Kitagawa-Keyfitz decomposition. Comparative studies of the stochastic growth rate require additional data on the stochastic dynamics of the environment, beyond that needed for time-invariant models (Fig. 9.1). Many stochastic studies present conditional results; for example, the study of *L. bradshawii* provides log *λs* as a function of *f* , *ρ*, and , but does not estimate the value of log *λs* actually exhibited in either of the two sites. To do so would require long-term data on the stochastic environment, which is hard to come by. However, such information may possibly be extracted from historical data (e.g., Smith et al. 2005; Lawler et al. 2009), or projected from climate models (Hunter et al. 2010; Jenouvrier et al. 2009b).

The methods presented here are not limited to Markovian environments in which the environmental states have an interpretation (years since fire, flood conditions, etc.). They can also be used when matrices are randomly selected from a series collected over time (e.g., the early study of Bierzychudek (1982) based on two yearly matrices, or the study by Jenouvrier et al. (2009b) based on 44 years of matrices for emperor penguins). Although such models are indeed Markov chains, if years are simply a random sample of environmental variation, then it is of little interest to know the contribution of vital rate differences in, say, 1988 compared to 1989 or 1987. In these models, the mean and variance of the vital rates may be of more interest. Davison et al. (2010), drawing on the stochastic elasticity results of Tuljapurkar et al. (2003), have presented an approach to LTRE analysis in terms of the contributions of differences in the mean and the variance of the vital rates. That method nicely complements the approach presented here.

In the analysis of *Lomatium bradshawii*, even large differences in environmental autocorrelation made small contributions to treatment effects on log *λs*. This is not surprising, given the generally small impact of changes in autocorrelation on the stochastic growth rate in this model (Caswell and Kaye 2001). It is, however, not guaranteed. Given the proper interaction between environmental states and the stage structure, autocorrelation can have dramatic impacts on the growth rate (Caswell 2001, Example 14.1). How often this happens in nature will only be revealed by further comparative studies.

Changing focus from plants in a fluctuating fire environment to human populations projected in response to stochastic fluctuations in mortality and fertility (e.g., Tuljapurkar 1992; Lee and Tuljapurkar 1994), there are possibilities for applying this approach to population projections. However, such attempts will be challenging because the stochastic environments are not stationary, and the interest is not in asymptotic stochastic growth, but in short term transient dynamics. A combination of the transient analyses in Chap. 7 with the decomposition approach here might yield interesting results.

#### **Bibliography**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Part IV Nonlinear Models**

# **Chapter 10 Sensitivity Analysis of Nonlinear Demographic Models**

#### **10.1 Introduction**

Nonlinearities in demographic models arise due to density dependence, frequency dependence (in 2-sex models), feedback through the environment or the economy, recruitment subsidy due to immigration, and from the scaling inherent in calculations of proportional population structure. This chapter presents a series of analyses particular to nonlinear models: the sensitivity and elasticity of equilibria, cycles, ratios (e.g., dependency ratios), age averages and variances, temporal averages and variances, life expectancies, and population growth rates, for both age-classified and stage-classified models.

Nonlinearity is defined in contrast to linearity. If **x** is an age or stage distribution vector, and if the dynamics of **x** are given by

$$\mathbf{x}(t+1) = f[\mathbf{x}(t)],\tag{10.1}$$

then the model is linear if *f (*·*)* is a linear function, i.e., if

$$f\left(a\mathbf{x}\_{\mathsf{l}} + b\mathbf{x}\_{\mathsf{2}}\right) = af\left(\mathbf{x}\_{\mathsf{l}}\right) + bf\left(\mathbf{x}\_{\mathsf{2}}\right) \tag{10.2}$$

for any constants *a* and *b* and any vectors **x**<sup>1</sup> and **x**2.

If a model is not linear, it is nonlinear. Not surprisingly, this covers a lot of territory, but nonlinearity in demographic models can be classified into four main sources: density dependence, environmental feedback, interactions between the sexes, and models that arise in calculation of proportional structure.

Chapter 10 is modified, under the terms of a Creative Commons Attribution License, from: Caswell, H. 2008. Perturbation analysis of nonlinear matrix population models. Demographic Research 18:59–116. ©Hal Caswell.

**Density dependence:** arises when one or more of the per-capita vital rates are functions of the numbers or density of the population. Such effects have been incorporated into demographic studies of plants (e.g., Solbrig et al. 1988; Gillman et al. 1993; Silva Matos et al. 1999; Pardini et al. 2009; Shyu et al. 2013) and animals (e.g., Pennycuick 1969; Clutton-Brock et al. 1997; Cushing et al. 2003; Bonenfant et al. 2009). Density dependence has been intensively studied in the laboratory (e.g., Pearl et al. 1927; Frank et al. 1957; Costantino and Desharnais 1991; Carey et al. 1995; Mueller and Joshi 2000; Cushing et al. 2003). It can arise from competition for food, space, or other resources, or from interactions (e.g., cannibalism) among individuals.

Simple density dependence is less often invoked by human demographers1. Weiss and Smouse (1976) proposed a density-dependent matrix model, and Wood and Smouse (1982) applied it to the Gainj people of Papua New Guinea. Density dependence is included in epidemiological feedback models applied to a rural English population in the sixteenth and seventeenth centuries by Scott and Duncan (1998).

The Easterlin effect (1961) produces density dependence in which fertility is a function of cohort size. Analysis of the Easterlin effect has focused mostly on the possibility that it could generate cycles in births (e.g., Lee 1974, 1976; Frauenthal and Swick 1983; Wachter and Lee 1989; Chu 1998).

**Environmental** (**or economic**) **feedback**. Density-dependent models are often an attempt to sneak in, by the back door as it were, a feedback through the environment. A change in population size changes some aspect of the environment, which affects the vital rates, which in turn affect future population size. Models in which the feedback operates through resource consumption are the basis for the food chain and food web models that underlie models of global biogeochemistry (e.g.,. Hsu et al. 1977; Tilman 1982; Murdoch et al. 2003; Fennel and Neumann 2004). These models are typically unstructured, but there is a rich literature on structured models, written as partial differential equations, to incorporate physiological structure and resource feedback (de Roos and Persson 2013).

Feedback models are also invoked in human demography, with the feedback operating through the economy (Lee 1986, 1987; Chu 1998). An interesting aspect of these approaches is the possibility that, if larger populations support more robust economies, the feedback could be positive instead of negative (Lee 1986; Cohen 1995, Appendix 6). An exciting combination of ecological and

<sup>1</sup>Lee (1987) reviewed the situation and said ". . . we might say that human demography is all about Leslie matrices and the determinants of unconstrained growth in linear models, whereas animal population studies are all about Malthusian equilibrium through density dependence in nonlinear models . . . ". He admits that this is an exaggeration, and there clearly are nonlinear concerns in human demography (Bonneuil 1994), but a non-exhaustive survey finds no mention of density dependence in several contemporary human demography texts (e.g., Hinde 1998; Preston et al. 2001; Keyfitz and Caswell 2005).

economic feedback appears in the food ratio model recently proposed by Lee and Tuljapurkar (2008).

**Two-sex models**. To the extent that both males and females are required for reproduction (and, in the bigger scheme of things, this is not always so), demography is nonlinear because the marriage function or mating function cannot satisfy (10.2). Nonlinear two-sex models have a long tradition in human demography (see reviews in Keyfitz 1972; Pollard 1977) and have been applied in ecology (e.g., Lindström and Kokko 1998; Legendre et al. 1999; Kokko and Rankin 2006; Lenz et al. 2007; Jenouvrier et al. 2010, 2012). Their mathematical properties have been investigated by e.g, Caswell and Weeks (1986), Chung (1994) and Iannelli et al. (2005) and in a very abstract setting by Nussbaum (1988, 1989).

In their most basic form, two-sex models differ from density-dependent models in that the vital rates depend only on the relative, not the absolute, abundances of stages in the population (they are sometimes called frequencydependent for this reason). This has important implications for their dynamics.

**Models for proportional population structure**. Even when the dynamics of abundance are linear, the dynamics of *proportional* population structure are nonlinear (e.g., Tuljapurkar 1997). This leads to some useful results on the sensitivity of the stable age or stage distribution and the reproductive value.

Linear models lead to exponential growth and convergence to a stable structure. Much of their analysis focuses on the population growth rate *λ* or *r* = log *λ*. Nonlinear models do not usually lead to exponential growth (frequency-dependent two-sex models are an exception). Instead, their trajectories converge to an attractor. The attractor may be an equilibrium point, a cycle, an invariant loop (yielding quasiperiodic dynamics), or a strange attractor (yielding chaotic dynamics); see Cushing (1998) or Caswell (2001, Chapter 16) for a detailed discussion.

This chapter analyzes the sensitivity and elasticity of equilibria and cycles. Because the dynamic models considered here are discrete, solutions always exist and are unique. The nature and the number of the attractors depends on the specific model. Perturbation analysis always considers perturbations of *something*, so the equilibria or cycles must be found before their perturbation properties can be analyzed.

#### **10.2 Density-Dependent Models**

We begin with the basic discrete-time2 density-dependent model, written as

$$\mathbf{n}(t+1) = \mathbf{A}[\theta, \mathbf{n}(t)] \text{ } \mathbf{n}(t) \tag{10.3}$$

$$\frac{d\mathbf{n}}{dt} = \mathbf{A}[\theta, \mathbf{n}(t)] \text{ m}(t)$$

<sup>2</sup>It is possible to generalize to continuous-time models, that would be written

where **n***(t)* is a population vector of dimension *s*×1 and **A** is a population projection matrix of dimension *s* × *s*. The matrix **A** depends on a *p* × 1 vector *θ* of parameters as well as on the current population vector **n***(t)*. 3

#### *10.2.1 Linearizations Around Equilibria*

An equilibrium of (10.3) satisfies

$$
\hat{\mathbf{n}} = \mathbf{A} \left[ \boldsymbol{\theta}, \hat{\mathbf{n}} \right] \hat{\mathbf{n}}.\tag{10.4}
$$

Such an equilibrium may be stable (small perturbations from **n**ˆ eventually return to the equilibrium) or unstable.4 That stability is determine by the linearization of the nonlinear system (10.3) near **x**ˆ. That is, define the deviation from **x**ˆ as **z***(t)* = **x***(t)* − **x**ˆ. Then **z***(t)* follows

$$\mathbf{z}(t+1)\mathbf{M}[\theta,\hat{\mathbf{x}}]\mathbf{z}(t) \tag{10.5}$$

The matrix **M** is the Jacobian matrix,

$$\mathbf{M} = \left. \frac{\partial \mathbf{x}(t+1)}{\partial \mathbf{x}^{\mathsf{T}}(t)} \right|\_{\hat{\mathbf{x}}} \tag{10.6}$$

To obtain **M**, differentiate both sides of (10.3),

$$d\mathbf{x}(t+1) = (d\mathbf{A})\,\mathbf{x} + \mathbf{A}\,(d\mathbf{x})\tag{10.7}$$

Applying the vec operator to both sides gives

$$d\mathbf{x}(t+1) = \left(\mathbf{x}^{\mathsf{T}} \otimes \mathbf{I}\_{\mathsf{s}}\right) d\mathbf{vec} \,\mathbf{A} + \mathbf{A}d\mathbf{x} \tag{10.8}$$

from which

$$\mathbf{M} = \left(\mathbf{x}^{\mathsf{T}} \otimes \mathbf{I}\_{\mathsf{s}}\right) \frac{d \mathbf{vec} \,\mathbf{A}}{d \mathbf{x}^{\mathsf{T}}} + \mathbf{A} \tag{10.9}$$

for some appropriately defined matrix function **A**; see Verdy and Caswell (2008). Such models are less often used, but see Shyu and Caswell (2016a, 2018) for a two-sex model example.

<sup>3</sup>The explicit dependence on *θ* and **n***(t)* will be neglected when it is obvious from the context.

<sup>4</sup>A careful consideration of stability requires more care with the definition of these terms, but will not concern us here. See Caswell (2001) and Cushing (1998) for more details.

where **I***<sup>s</sup>* is an identity matrix of order *s*. The linearization at the equilibrium is obtained by evaluating **M** at **x** = **x**ˆ:

$$\mathbf{M}\left[\boldsymbol{\theta},\hat{\mathbf{x}}\right] = \left(\hat{\mathbf{x}}^{\mathsf{T}} \otimes \mathbf{I}\_{s}\right) \frac{\partial \text{vec}\,\mathbf{A}\left[\boldsymbol{\theta},\hat{\mathbf{x}}\right]}{\partial \mathbf{x}^{\mathsf{T}}} + \mathbf{A}\left[\boldsymbol{\theta},\hat{\mathbf{x}}\right] \tag{10.10}$$

If all the eigenvalues of **M** are less than one in magnitude, the equilibrium **x**ˆ is locally asymptotically stable. The linearization also provides valuable information about short-term transient responses to perturbation; see Sect. 10.2.4.

#### *10.2.2 Sensitivity of Equilibrium*

Our goal is to find the derivatives of all the entries of **n**ˆ with respect to all of the parameters in *θ*; these are the entries of the *s* × *p* matrix

$$\frac{d\hat{\mathbf{n}}}{d\theta^{\top}}.$$

We begin by taking the differential of both sides of (10.4):

$$d\hat{\mathbf{n}} = (d\mathbf{A})\hat{\mathbf{n}} + \mathbf{A}(d\hat{\mathbf{n}}).\tag{10.11}$$

Rewrite this as

$$d\hat{\mathbf{n}} = \mathbf{I}\_s(d\mathbf{A})\hat{\mathbf{n}} + \mathbf{A}(d\hat{\mathbf{n}}),\tag{10.12}$$

where **I***<sup>s</sup>* is an identity matrix of dimension *s*. Next apply the vec operator to both sides, remembering that since **n**ˆ is a column vector, vec **n**ˆ = **n**ˆ, and apply Roth's theorem, to obtain

$$d\hat{\mathbf{n}} = \left(\hat{\mathbf{n}}^{\mathsf{T}} \otimes \mathbf{I}\_{\mathsf{s}}\right) d\mathsf{vec} \, \mathbf{A} + \mathbf{A}d\hat{\mathbf{n}}.\tag{10.13}$$

However, **A** is a function of both *θ* and **n**ˆ, so

$$d\text{vec}\,\mathbf{A} = \frac{\partial \text{vec}\,\mathbf{A}}{\partial \boldsymbol{\theta}^{\mathsf{T}}}d\boldsymbol{\theta} + \frac{\partial \text{vec}\,\mathbf{A}}{\partial \mathbf{n}^{\mathsf{T}}}d\mathbf{\hat{n}}.\tag{10.14}$$

Substituting (10.14) into (10.13) and applying the chain rule leads to5

$$\frac{d\hat{\mathbf{n}}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left(\hat{\mathbf{n}}^{\mathsf{T}} \otimes \mathbf{I}\_{\mathsf{s}}\right) \left(\frac{\partial \text{vec}\,\mathbf{A}}{\partial \boldsymbol{\theta}^{\mathsf{T}}} + \frac{\partial \text{vec}\,\mathbf{A}}{\partial \mathbf{n}^{\mathsf{T}}} \frac{d\hat{\mathbf{n}}}{d\boldsymbol{\theta}^{\mathsf{T}}}\right) + \mathbf{A}\frac{d\hat{\mathbf{n}}}{d\boldsymbol{\theta}^{\mathsf{T}}}.\tag{10.15}$$

<sup>5</sup>It is reassuring to check that the dimensions of all these quantities are compatible:

Finally, solve (10.15) for *<sup>d</sup>***n**ˆ*/dθ*<sup>T</sup> to obtain

$$\frac{d\hat{\mathbf{n}}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left(\mathbf{I}\_{\mathrm{s}} - \mathbf{A} - \left(\hat{\mathbf{n}}^{\mathsf{T}} \otimes \mathbf{I}\_{\mathrm{s}}\right) \frac{\partial \text{vec}\,\mathbf{A}}{\partial \mathbf{n}^{\mathsf{T}}}\right)^{-1} \left(\hat{\mathbf{n}}^{\mathsf{T}} \otimes \mathbf{I}\_{\mathrm{s}}\right) \frac{\partial \text{vec}\,\mathbf{A}}{\partial \boldsymbol{\theta}^{\mathsf{T}}} \tag{10.16}$$

where **<sup>A</sup>**, *<sup>∂</sup>*vec **<sup>A</sup>***/∂θ*T, and *<sup>∂</sup>*vec **<sup>A</sup>***/∂***n**ˆ<sup>T</sup> are evaluated at **<sup>n</sup>**ˆ.

Comparing (10.16) and Eq. (10.10) for the linearization, we see that the sensitivity of equilibrium can be written

$$\frac{d\hat{\mathbf{n}}}{d\boldsymbol{\theta}^{\mathsf{T}}} = (\mathbf{I}\_{s} - \mathbf{M})^{-1} \left( \hat{\mathbf{n}}^{\mathsf{T}} \otimes \mathbf{I}\_{s} \right) \frac{\partial \text{vec}\,\mathbf{A}}{\partial \boldsymbol{\theta}^{\mathsf{T}}}.\tag{10.17}$$

The matrix *(***I***<sup>s</sup>* − **M***)* is singular if 1 is an eigenvalue of **M**; i.e., at a bifurcation point when the equilibrium **n**ˆ becomes unstable. At that point, quite appropriately, the sensitivity is not defined because the change in the equilibrium is not continuous.

The following example, applying (10.16) to a simple model, shows the basic steps and output of the analysis.

**Example 1: A simple two-stage model** The most basic distinction in the life cycle of many organisms is between non-reproducing juveniles and reproducing adults. A model based on these stages (Neubert and Caswell 2000) is parameterized by the juvenile survival *σ*1, the adult survival *σ*2, the growth or maturation probability *γ* (the expected time to maturity is 1*/γ* ), and the adult fertility *f* . The projection matrix is

$$\mathbf{A} = \begin{pmatrix} \sigma\_1 (1 - \gamma) \ f \\ \sigma\_1 \nu & \sigma\_2 \end{pmatrix}. \tag{10.18}$$

Any of the vital rates could be density-dependent; here we suppose that juvenile survival *σ*<sup>1</sup> depends on total density:

$$
\sigma\_{\mathbf{l}}(\mathbf{n}) = \tilde{\sigma} \exp(-\mathbf{1}^{\mathsf{T}}\mathbf{n});\tag{10.19}
$$

where **1** is a vector of ones.

Define the parameter vector as *θ* = *f γ σ σ* ˜ <sup>2</sup> T . To apply (10.16) requires the derivatives of **A**[*θ,* **n**] with respect to *θ* and with respect to **n**. These are

$$\underbrace{\frac{d\hat{\mathbf{n}}}{d\boldsymbol{\theta}^{\mathsf{T}}}}\_{\mathbf{s}\times p} = \underbrace{\left(\hat{\mathbf{n}}^{\mathsf{T}}\otimes\mathbf{I}\_{s}\right)}\_{\mathbf{s}\times x^{2}} \left(\underbrace{\frac{\partial\operatorname{vec}\mathbf{A}}{\partial\boldsymbol{\theta}^{\mathsf{T}}}}\_{\mathbf{s}^{2}\times p} + \underbrace{\frac{\partial\operatorname{vec}\mathbf{A}}{\partial\mathbf{n}^{\mathsf{T}}}}\_{\mathbf{s}^{2}\times x}\underbrace{\frac{\partial\hat{\mathbf{n}}}{\partial\boldsymbol{\theta}^{\mathsf{T}}}}\_{\mathbf{s}\times p}\right) + \underbrace{\mathbf{A}}\_{\mathbf{s}\times x}\underbrace{\frac{d\hat{\mathbf{n}}}{d\boldsymbol{\theta}^{\mathsf{T}}}}\_{\mathbf{s}\times p}.$$

$$\frac{d\mathbf{vec}\,\mathbf{A}}{df} = \text{vec}\begin{pmatrix} 0 \ 1 \\ 0 \ 0 \end{pmatrix}\tag{10.20}$$

$$\frac{d\mathbf{vec}\,\mathbf{A}}{d\boldsymbol{\chi}} = \text{vec}\begin{pmatrix} -\sigma\_1(\mathbf{n})\,\,\mathbf{0} \\\\ \sigma\_1(\mathbf{n})\,\,\,\mathbf{0} \end{pmatrix} \tag{10.21}$$

$$\frac{d\text{vec}\,\mathbf{A}}{d\tilde{\sigma}} = \text{vec}\begin{pmatrix} (1-\chi)\exp(-\mathbf{1}^{\mathsf{T}}\mathbf{n}) \, 0\\ \chi\exp(-\mathbf{1}^{\mathsf{T}}\mathbf{n}) & 0 \end{pmatrix} \tag{10.22}$$

$$\frac{d\mathbf{vec}\,\mathbf{A}}{d\sigma\_2} = \text{vec}\begin{pmatrix} 0 \,0\\0 \,1 \end{pmatrix}\tag{10.23}$$

$$\frac{d\mathbf{vec}\,\mathbf{A}}{dn\_{\mathrm{l}}} = \frac{d\mathbf{vec}\,\mathbf{A}}{dn\_{\mathrm{2}}} = \mathrm{vec}\begin{pmatrix} -\sigma\_{\mathrm{l}}(\mathbf{n})(\mathrm{l}-\boldsymbol{\gamma}) \,\mathbf{0} \\ -\sigma\_{\mathrm{l}}(\mathbf{n})\boldsymbol{\gamma} & 0 \end{pmatrix}.\tag{10.24}$$

The derivative of **A** with respect to the *θ* is the 4 × 4 matrix

$$\frac{\partial \text{vec}\,\mathbf{A}}{\partial \boldsymbol{\theta}^{\mathsf{T}}} = \begin{pmatrix} 0 - \sigma\_{\text{l}}(\mathbf{n}) \ (1 - \boldsymbol{\gamma}) \exp(-\mathbf{1}^{\mathsf{T}}\mathbf{n}) \ 0\\ 0 \ \sigma\_{\text{l}}(\mathbf{n}) \qquad \boldsymbol{\gamma} \ \exp(-\mathbf{1}^{\mathsf{T}}\mathbf{n}) & 0\\ 1 & 0 & 0 \\ 0 & 0 & 0 \end{pmatrix},\tag{10.25}$$

where each column corresponds to an entry of *θ* and each row to an element of vec **A**. The derivative of **A** with respect to **n** is

$$\frac{\partial \text{vec}\,\mathbf{A}}{\partial \mathbf{n}^{\mathsf{T}}} = \begin{pmatrix} -\sigma\_{\text{l}}(\mathbf{n})(1-\boldsymbol{\gamma}) & -\sigma\_{\text{l}}(\mathbf{n})(1-\boldsymbol{\gamma}) \\ -\sigma\_{\text{l}}(\mathbf{n})\boldsymbol{\gamma} & -\sigma\_{\text{l}}(\mathbf{n})\boldsymbol{\gamma} \\ 0 & 0 \\ 0 & 0 \end{pmatrix}.\tag{10.26}$$

Each column corresponds to an entry of **n** and each row to an element of vec **A**.

Using some arbitrary parameter values (not unreasonable for humans or other large mammals)

$$f = 0.25$$

$$\nu = 1/15$$

$$\tilde{\sigma} = 0.98$$

$$\sigma\_2 = 0.95$$

leads to an equilibrium population

$$
\hat{\mathbf{n}} = \begin{pmatrix} 0.1053 \\ 0.1109 \end{pmatrix},\tag{10.27}
$$

obtained by iterating the model to convergence.

These patterns reflect the life history, although comparative study of this dependence has scarcely begun. For example, if the demographic parameters were more appropriate for an insect, say with high fertility (*f* = 70), rapid maturation (*γ* = 0*.*9), and low juvenile survival (*σ*˜ = 0*.*1), and in which most adults die after reproducing once (*σ*<sup>2</sup> = 0*.*01), then the equilibrium would become

$$
\hat{\mathbf{n}} = \begin{pmatrix} 1.826 \\ 0.026 \end{pmatrix} \tag{10.28}
$$

with sensitivities

$$\frac{d\hat{\mathbf{n}}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \begin{pmatrix} 0.01 & 1.08 \ 9.86 \ 0.99 \\ -0.0002 \ 0.02 \ 0.14 \ 0.01 \end{pmatrix} . \tag{10.29}$$

In this life history, increases in fertility have very small effects on the equilibrium population, and the effect of increased fertility on adult density is slightly negative. Changes in the maturation rate or in juvenile or adult survival have much larger impacts on juvenile density than on adult density. -

## *10.2.3 Dependent Variables: Beyond* **nˆ**

The equilibrium vector **n**ˆ is usually not the only dependent variable of interest. If we write **m** = **m***(***n***)* for any vector- or scalar-valued transformation of **n**, then the sensitivity of **m** is just

$$\frac{d\hat{\mathbf{m}}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \frac{d\hat{\mathbf{m}}}{d\mathbf{n}^{\mathsf{T}}} \frac{d\hat{\mathbf{n}}}{d\boldsymbol{\theta}^{\mathsf{T}}}.\tag{10.30}$$

The possibilities for dependent variables are, roughly speaking, limited only by one's imagination. The following is a list of examples.

1. Weighted population density. Let **c** ≥ 0 be a vector of weights. Weighted population density is then *N (t)* <sup>=</sup> **<sup>c</sup>**T**n***(t)*. Examples include total density (**c** = **1**), the density of a subset of stages (*ci* = 1 for stages to be counted; *ci* = 0 otherwise), biomass (*ci* is the biomass of stage *i*), basal area, metabolic rate, etc. The sensitivity of *N*ˆ is

$$\frac{d\vec{N}}{d\theta^{\mathsf{T}}} = \mathbf{c}^{\mathsf{T}} \frac{d\hat{\mathbf{n}}}{d\theta^{\mathsf{T}}}.\tag{10.31}$$

2. Ratios, measuring the relative abundances of different stages. Let

$$R(t) = \frac{\mathbf{a}^{\mathsf{T}}\mathbf{n}(t)}{\mathbf{b}^{\mathsf{T}}\mathbf{n}(t)}\tag{10.32}$$

where **a** ≥ 0 and **b** ≥ 0 are weight vectors. Examples include the dependency ratio (in human populations, the ratio of the individuals below 15 or above 65 to those between 15 and 65; see Sect. 10.5.3), the sex ratio, and the ratio of juveniles to adults, which is used in wildlife management; see Skalski et al. (2005). Differentiating (10.32) gives

$$\frac{d\,\hat{\mathcal{R}}}{d\theta^{\mathsf{T}}} = \left(\frac{\mathbf{b}^{\mathsf{T}}\hat{\mathbf{n}}\mathbf{a}^{\mathsf{T}} - \mathbf{a}^{\mathsf{T}}\hat{\mathbf{n}}\mathbf{b}^{\mathsf{T}}}{\left(\mathbf{b}^{\mathsf{T}}\hat{\mathbf{n}}\right)^{2}}\right) \frac{d\hat{\mathbf{n}}}{d\theta^{\mathsf{T}}}.\tag{10.33}$$


#### *10.2.4 Reactivity and Transient Dynamics*

The asymptotic stability of an equilibrium is determined by the eigenvalues of the Jacobian matrix **M** in (10.9), evaluated at that equilibrium. In the short term, however, perturbations of the population away from the equilibrium can exhibit transient dynamics that differ from their asymptotic behavior. In particular, perturbations of stable equilibria, that are destined to eventually return to the equilibrium, may move (much) farther away before that return. Neubert and Caswell (1997) introduced three indices, each calculated from **M**, to quantify these transient responses.6 The *reactivity* of an asymptotically stable equilibrium is the maximum, over all perturbations, of the rate at which the trajectory departs from the equilibrium. At any time following a perturbation, there is a maximum (over all perturbations) deviation

<sup>6</sup>Because these indices are calculated from **M**, they are properly considered properties of the system and its dynamics. Stott et al. (2011) and Stott (2016) have also considered indices of transient response that reflect the particular initial condition rather than the inherent dynamics of the system.

from the equilibrium. This maximum is the *amplification envelope*. It gives an upper bound on the extent of transient amplification as a function of time. The phrase "over all perturbations" in these definitions signals that the transient amplification depends on the direction of the perturbation. The perturbation that produces the maximum amplification at any specified time is the *optimal perturbation* (Verdy and Caswell 2008).<sup>7</sup>

The transient dynamics of the perturbed system are described by the evolution of the magnitude of **<sup>z</sup>**, as measured by the Euclidean norm **z** <sup>=</sup> <sup>√</sup> **z**T**z**. The reactivity is the maximum, over all perturbations, of the growth rate of **z**, as *t* → 0, and is given by

$$\nu\_0 = \begin{cases} \lambda\_\text{l} \, [\mathbf{H}(\mathbf{M})] \, \text{continuous time} \\ \log \sigma\_\text{l} \, (\mathbf{M}) \, \text{discrete time} \end{cases} \tag{10.34}$$

The matrix **H***(***M***)* = **<sup>M</sup>** <sup>+</sup> **<sup>M</sup>**T */*2 is the Hermitian part of **M** and *λ*<sup>1</sup> denotes the eigenvalue with largest real part (Neubert and Caswell 1997). In discrete time, reactivity is the log of the largest singular value of **M**, which we denote *σ*1*(***M***)*.

The amplification envelope is

$$\rho(t) = \begin{cases} \sigma\_{\text{l}} \left( e^{\mathbf{M}t} \right) \text{ continuous} \\ \sigma\_{\text{l}} \left( \mathbf{M}^{\text{l}} \right) \text{ discrete} \end{cases} \tag{10.35}$$

The optimal perturbation, normalized to length 1, is given by the right singular vector corresponding to the singular value that defines *ρ(t)*.

Verdy and Caswell (2008) presented a complete sensitivity analysis of reactivity, the amplification envelope, and the optimal perturbation, in both continuous and discrete time. Suppose the *ξ* be one of the indices, and suppose that the model depends on a parameter vector *θ*. Changes in *θ* will change the equilibrium vector, which will contribute to changes in the Jacobian matrix, so that the sensitivity of *ξ* to *θ* is

$$\frac{d\boldsymbol{\xi}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left(\frac{d\boldsymbol{\xi}}{d\mathbf{vec}\,^{\mathsf{T}}\mathbf{M}}\right) \left(\frac{\partial \mathbf{vec}\,\mathbf{M}}{\partial \boldsymbol{\theta}^{\mathsf{T}}} + \frac{\partial \mathbf{vec}\,\mathbf{M}}{\partial \boldsymbol{\hat{n}}^{\mathsf{T}}} \frac{d\boldsymbol{\hat{n}}}{d\boldsymbol{\theta}^{\mathsf{T}}}\right) \tag{10.36}$$

The sensitivity of *ξ* in (10.36) requires four pieces: the linearization **M** at the equilibrium, which is given by (10.10), the sensitivity of the equilibrium **n**ˆ to the parameters, which is given by (10.16), the sensitivity of the Jacobian matrix **M** to the parameters, and the sensitivity of the index *ξ* to the matrix **M**. The sensitivity

<sup>7</sup>It is now known that reactivity is a common property of many ecological systems, including populations described by discrete matrix population models (Neubert and Caswell 1997; Chen and Cohen 2001; Neubert et al. 2004; Marvier et al. 2004; Caswell and Neubert 2005; Verdy and Caswell 2008).

of *ξ* to **M** depends on which index, but the calculations involve perturbations of eigenvalues, singular values, or the matrix exponential, and are given in Verdy and Caswell (2008). The derivative of the linearization **M** is obtained by differentiating all the terms in Eq. (10.10); the result, along with several examples, is given in Verdy and Caswell (2008, eq. (37)).

#### *10.2.5 Elasticity Analysis*

The derivatives in the matrix *<sup>d</sup>***n**ˆ*/dθ*<sup>T</sup> give the results of small additive perturbations of the parameters. It is often useful to study the elasticities, which give the proportional result of small proportional perturbations,

$$\frac{\epsilon\hat{\mathbf{n}}}{\epsilon\hat{\boldsymbol{\theta}}^{\mathsf{T}}} = \mathcal{D}\left(\hat{\mathbf{n}}\right)^{-1} \frac{d\hat{\mathbf{n}}}{d\boldsymbol{\theta}^{\mathsf{T}}} \mathcal{D}\left(\boldsymbol{\theta}\right),\tag{10.37}$$

The elasticity of any other (scalar- or vector-valued) dependent variable *f (***n**ˆ*)* is given by

$$\frac{\epsilon f(\hat{\mathbf{n}})}{\epsilon \theta^{\mathsf{T}}} = \mathcal{D} \left( f(\hat{\mathbf{n}}) \right)^{-1} \frac{df(\hat{\mathbf{n}})}{d\theta^{\mathsf{T}}} \mathcal{D}(\theta). \tag{10.38}$$

As usual, elasticities can only be calculated when *θ* ≥ 0 and *f (***n**ˆ*) >* 0.

**Example 2: Metabolic population size in** *Tribolium* Flour beetles of the genus *Tribolium* have been the subject of a long series of experiments on nonlinear population dynamics (reviewed by Cushing et al. 2003). *Tribolium* lives in stored flour. In addition to feeding on the flour, adults and larvae cannibalize eggs, and adults cannibalize pupae. These interactions are the source of nonlinearity in the demography, and are captured in a three-stage (larvae, pupae, and adults) model. The projection matrix is

$$\mathbf{A}[\theta, \mathbf{n}] = \begin{pmatrix} 0 & 0 & b \exp(-c\_{el}n\_1 - c\_{ea}n\_3) \\ 1 - \mu\_l & 0 & 0 \\ 0 & \exp(-c\_{pd}n\_3) & 1 - \mu\_a \end{pmatrix} \tag{10.39}$$

where *b* is the clutch size, *cea*, *cel*, and *cpa* are rates of cannibalism (of eggs by adults, eggs by larvae, and pupae by adults, respectively), and *μl* and *μa* are larval and adult mortalities (the mortality of pupae, in these laboratory conditions, is effectively zero). Parameter values from an experiment reported by Costantino et al. (1997)

$$b = 6.598$$

$$c\_{ea} = 1.155 \times 10^{-2}$$

$$c\_{el} = 1.209 \times 10^{-2}$$

$$c\_{pa} = 4.7 \times 10^{-3}$$

$$\mu\_a = 7.729 \times 10^{-3}$$

$$\mu\_l = 2.055 \times 10^{-1}$$

produce a stable equilibrium

$$
\hat{\mathbf{n}} = \begin{pmatrix} 22.6\\ 18.0\\ 385.2 \end{pmatrix}. \tag{10.40}
$$

The sensitivity of **n**ˆ is calculated using (10.16). However, the damage caused by *Tribolium* as a pest of stored grain products might well depend more on metabolism than on numbers. Emekci et al. (2001) estimated the metabolic rates of larvae, pupae, and adults as 9, 1, and 4.5 *μ*l CO2 h−1, respectively. We define the metabolic population size as *Nm(t)* <sup>=</sup> **<sup>c</sup>**T**n***(t)* where **<sup>c</sup>**<sup>T</sup> <sup>=</sup> 914*.*5 , and calculate the sensitivity and elasticity of *N*ˆ*<sup>m</sup>* using (10.37) and (10.31).

Figure 10.1 shows the elasticity of **n**ˆ and *N*ˆ*<sup>m</sup>* to each of the parameters. The elasticities are diverse and perhaps counterintuitive. Increases in fecundity increase the equilibrium density of all stages; increases in the cannibalism of eggs by adults reduces the density of all stages. But increased cannibalism of pupae by adults increases the density of larvae and pupae, as does an increase in the mortality of adults.

**Fig. 10.1** Sensitivity analysis of equilibrium for the flour beetle *Tribolium* in Example 2. (**a**) The elasticity of the equilibrium **n**ˆ to the parameters (see Example 2 for definitions). (**b**) The elasticity of the equilibrium population respiration rate *N*ˆ*<sup>m</sup>* to the parameters

When the stages are weighted by their metabolic rate, the elasticity of *N*ˆ*<sup>m</sup>* to fecundity is positive, but the elasticities to all the other parameters (cannibalism rates and mortalities) are negative. The positive effects of *cpa* and *μa* on **n**ˆ disappear when the stages are weighted according to metabolism. -

#### *10.2.6 Continuous-Time Models*

We have focused on discrete-time models throughout this book. An analogous perturbation analysis can be carried out on continuous-time models of the form

$$\frac{d\mathbf{n}}{dt} = \mathbf{A}\left[\mathbf{n}(t)\right]\mathbf{n}(t) \tag{10.41}$$

Verdy and Caswell (2008) present a parallel presentation of the continuous and discrete models. The linearization at **n**ˆ is, once again, given by (10.10). If all the eigenvalues of **M** have negative real parts, the equilibrium is locally stable.

The sensitivity of the equilibrium **n**ˆ is

$$\frac{d\hat{\mathbf{n}}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left\{-\mathbf{A} - \left(\hat{\mathbf{n}}^{\mathsf{T}} \otimes \mathbf{I}\_{\mathsf{s}}\right) \frac{\partial \text{vec}\,\mathbf{A}}{\partial \mathbf{n}^{\mathsf{T}}}\right\}^{-1} \left(\hat{\mathbf{n}}^{\mathsf{T}} \otimes \mathbf{I}\_{\mathsf{s}}\right) \frac{\partial \text{vec}\,\mathbf{A}}{\partial \boldsymbol{\theta}^{\mathsf{T}}},\tag{10.42}$$

with **A** and all its derivatives evaluated at the equilibrium **n**ˆ. Substituting (10.10) for **M** gives

$$\frac{d\hat{\mathbf{n}}}{d\boldsymbol{\theta}^{\mathsf{T}}} = -\mathbf{M}^{-1} \left( \hat{\mathbf{n}}^{\mathsf{T}} \otimes \mathbf{I}\_{\mathsf{s}} \right) \frac{\partial \text{vec}\,\mathbf{A}}{\partial \boldsymbol{\theta}^{\mathsf{T}}},\tag{10.43}$$

and **M** is nonsingular unless 0 is an eigenvalue of **M**, which corresponds to a bifurcation point of the equilibrium.

#### **10.3 Environmental Feedback Models**

Environmental (or economic) feedback models write the vital rates as functions of some environmental variable, which in turn depends on population density. Feedback models may be static or dynamic. In static feedback models, the environment depends only on current conditions, with no inherent dynamics of its own. In dynamic feedback models, the environment can have dynamics as complicated as those of the population (e.g., if the environmental variable was the abundance of a prey species, affecting the dynamics of a predator species). The sensitivity analysis of dynamic feedback models is given in Sect. 10.8.

A static feedback model can be written

$$\mathbf{n}(t+1) = \mathbf{A}[\theta, \mathbf{n}(t), \mathbf{g}(t)] \text{ } \mathbf{n}(t) \tag{10.44}$$

$$\mathbf{g}(t) = \mathbf{g}[\theta, \mathbf{n}(t)] \tag{10.45}$$

where **g***(t)* is a vector (of dimension *q* × 1) describing the ecological or economic aspects of the environment on which the vital rates depend. As written here, the model admits the possibility that the vital rates in **A** might depend directly on **n** as well as on the environment.

At equilibrium

$$
\hat{\mathbf{n}} = \mathbf{A}[\theta, \hat{\mathbf{n}}, \hat{\mathbf{g}}] \hat{\mathbf{n}} \tag{10.46}
$$

$$
\hat{\mathbf{g}} = \mathbf{g}[\boldsymbol{\theta}, \hat{\mathbf{n}}].\tag{10.47}
$$

Differentiating these expressions gives

$$d\hat{\mathbf{n}} = \mathbf{A}(d\hat{\mathbf{n}}) + (d\mathbf{A})\hat{\mathbf{n}}\tag{10.48}$$

$$d\hat{\mathbf{g}} = \frac{\partial \hat{\mathbf{g}}}{\partial \boldsymbol{\theta}^{\mathsf{T}}} d\boldsymbol{\theta} + \frac{\partial \hat{\mathbf{g}}}{\partial \mathbf{n}} d\hat{\mathbf{n}}.\tag{10.49}$$

Applying the vec operator to (10.48) and expanding *d*vec **A** gives

$$d\hat{\mathbf{n}} = \left(\hat{\mathbf{n}}^{\mathsf{T}} \otimes \mathbf{I}\_{\mathsf{s}}\right) \left[\frac{\partial \text{vec}\,\mathbf{A}}{\partial \boldsymbol{\theta}^{\mathsf{T}}} d\boldsymbol{\theta} + \frac{\partial \text{vec}\,\mathbf{A}}{\partial \mathbf{g}^{\mathsf{T}}} d\hat{\mathbf{g}}\right] + \mathbf{A} d\hat{\mathbf{n}}.\tag{10.50}$$

Substituting (10.49) for *d***g**ˆ and rearranging gives

$$d\hat{\mathbf{n}} = \left(\hat{\mathbf{n}}^{\mathsf{T}} \otimes \mathbf{I}\_{s}\right) \left[\frac{\partial \text{vec}\,\mathbf{A}}{\partial \boldsymbol{\theta}^{\mathsf{T}}} + \frac{\partial \text{vec}\,\mathbf{A}}{\partial \mathbf{g}^{\mathsf{T}}} \frac{\partial \hat{\mathbf{g}}}{\partial \boldsymbol{\theta}^{\mathsf{T}}}\right] d\boldsymbol{\theta}$$

$$+ \left[\mathbf{A} + \left(\hat{\mathbf{n}}^{\mathsf{T}} \otimes \mathbf{I}\_{s}\right) \frac{\partial \text{vec}\,\mathbf{A}}{\partial \mathbf{g}^{\mathsf{T}}} \frac{\partial \hat{\mathbf{g}}}{\partial \mathbf{n}^{\mathsf{T}}}\right] d\hat{\mathbf{n}}.\tag{10.51}$$

Solving for *d***n**ˆ and applying the identification theorem yields

$$\frac{d\hat{\mathbf{n}}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left[\mathbf{I}\_{s} - \mathbf{A} - \left(\hat{\mathbf{n}}^{\mathsf{T}} \otimes \mathbf{I}\_{s}\right) \frac{\partial \text{vec}\,\mathbf{A}}{\partial \mathbf{g}^{\mathsf{T}}} \frac{\partial \hat{\mathbf{g}}}{\partial \mathbf{n}^{\mathsf{T}}}\right]^{-1}$$

$$\times \left(\hat{\mathbf{n}} \otimes \mathbf{I}\_{s}\right) \left[\frac{\partial \text{vec}\,\mathbf{A}}{\partial \boldsymbol{\theta}^{\mathsf{T}}} + \frac{\partial \text{vec}\,\mathbf{A}}{\partial \mathbf{g}^{\mathsf{T}}} \frac{\partial \hat{\mathbf{g}}}{\partial \boldsymbol{\theta}^{\mathsf{T}}}\right]. \tag{10.52}$$

In this expansion, **A**, **g**, and all derivatives are evaluated at *(***n**ˆ*,* **g**ˆ). A comparison of (10.52) with (10.16) shows that including the feedback mechanism has simply written *d*vec **A***/d***n**<sup>T</sup> and *d*vec **A***/dθ*<sup>T</sup> in terms of **g** using the chain rule.

The environmental variable **g** may be of interest in its own right (e.g., in the food ratio model of Lee and Tuljapurkar (2008), in which it is a measure of well-being, measured in terms of food per individual). The sensitivity of **g**ˆ at equilibrium is

$$\frac{d\hat{\mathbf{g}}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \frac{\partial \hat{\mathbf{g}}}{\partial \boldsymbol{\theta}^{\mathsf{T}}} + \frac{\partial \hat{\mathbf{g}}}{\partial \mathbf{n}} \frac{d\hat{\mathbf{n}}}{d\boldsymbol{\theta}^{\mathsf{T}}} \tag{10.53}$$

where *<sup>d</sup>***g**ˆ*/dθ*<sup>T</sup> is given by (10.49) and (*d***n**ˆ*/dθ*T) by (10.52).

#### **10.4 Subsidized Populations and Competition for Space**

A subsidized population is one in which new individuals are recruited from elsewhere rather than (or in addition to) being generated by local reproduction. Subsidy is important in many plant and animal populations, especially of benthic marine invertebrates and fish. Many of these species produce planktonic larvae that may disperse very long distances (Scheltema 1971) before they settle and become sessile for the rest of their lives. Thus a significant part—maybe even all—of the recruitment at any location is independent of local fertility (e.g., Almany et al. 2007). Subsidized models have been used to analyze conservation programs in which captive-reared animals are released into a wild or re-established population (Sarrazin and Legendre 2000). They have been applied to the demography of human organizations; e.g., schools, businesses, learned societies (Gani 1963; Pollard 1968; Bartholomew 1982). They are also the basis of cohort-component population projections that include immigration.

In the simplest subsidized models, both local demography and recruitment are density-independent. Alternatively, recruitment may depend on some resource (e.g., space) whose availability depends on the local population, or the local demography after settlement may be density-dependent. All three cases can lead to equilibrium populations.

#### *10.4.1 Density-Independent Subsidized Populations*

The model,

$$\mathbf{n}(t+1) = \mathbf{A}[\theta]\mathbf{n}(t) + \mathbf{b}[\theta],\tag{10.54}$$

includes a subsidy vector **b** giving the input of individuals to the population.<sup>8</sup> The parameters *θ* may affect **A** or **b**, or both. If the fertility appearing in **A** is below replacement, so that *λ <* 1, then a stable equilibrium **<sup>n</sup>**<sup>ˆ</sup> exists.<sup>9</sup> This equilibrium satisfies

$$
\hat{\mathbf{n}} = \mathbf{A}\hat{\mathbf{n}} + \mathbf{b} \tag{10.55}
$$

$$\mathbf{h} = (\mathbf{I}\_s - \mathbf{A})^{-1} \mathbf{b}.\tag{10.56}$$

Differentiating (10.55) and applying the vec operator yields

$$d\hat{\mathbf{n}} = \left(\hat{\mathbf{n}}^{\mathsf{T}} \otimes \mathbf{I}\_{\mathsf{s}}\right) d\mathsf{vec}\,\mathbf{A} + \mathbf{A}\left(d\hat{\mathbf{n}}\right) + d\mathbf{b} \tag{10.57}$$

Solving for *d***n**ˆ and applying the chain rule gives the sensitivity of the equilibrium,

$$\frac{d\hat{\mathbf{n}}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left(\mathbf{I}\_{\mathsf{s}} - \mathbf{A}\right)^{-1} \left\{ \left(\hat{\mathbf{n}}^{\mathsf{T}} \otimes \mathbf{I}\_{\mathsf{s}}\right) \frac{d\text{vec}\,\mathbf{A}}{d\boldsymbol{\theta}^{\mathsf{T}}} + \frac{d\mathbf{b}}{d\boldsymbol{\theta}^{\mathsf{T}}} \right\}.\tag{10.58}$$

**Example 3: The Australian Academy of Sciences** Most human organizations are subsidized; recruits (new students in a school, new employees in a company) come from outside, not from local reproduction. In an early example of a subsidized population model, Pollard (1968) analyzed the age structure of the Australian Academy of Sciences, recruitment to which takes place by election.10 The Academy had been founded in 1954, and between 1955 and 1963 had elected about 6 new Fellows each year, with an age distribution (Pollard 1968, Table 2) given by


<sup>8</sup>The same model could describe harvest if **<sup>b</sup>** <sup>≤</sup> 0 (e.g., Hauser et al. 2006). This form of harvest produces unstable equilibria, and is not considered further here.

<sup>9</sup>If *λ >* 1, the population grows exponentially and the subsidy eventually becomes negligible. The equilibrium in this case is non-positive (and hence biologically irrelevant) and unstable. If *λ* = 1 then the population would remain constant in the absence of subsidy; any non-zero subsidy will then lead to unbounded population growth.

<sup>10</sup>Pollard's paper is remarkable for its treatment of both deterministic and stochastic models, but here I consider only the deterministic case.

Pollard interpolated this distribution to 1-year age classes, and combined it with a 1954 life table for Australian males (only one woman, the redoubtable geologist Dorothy Hill in 1956, had been elected to the Academy prior to 1969) to construct a model of the form (10.54). He calculated the equilibrium size and age composition of the Academy. Here, I have used the male life table for Australia 1953–1955 in Keyfitz and Flieger (1968, p. 558) to construct an age-classified matrix **A** with agespecific probabilities of survival *Pi* on its subdiagonal and zeros elsewhere. Were these vital rates and the age distribution of the subsidy vector to remain constant, the Academy would reach an equilibrium size of *N*ˆ = 149*.*5 with an age distribution **n**ˆ shown in Fig. 10.2a.

As parameters, consider the age-specific mortality rates *μi* = − log *Pi*, and define the parameter vector *θ* = *<sup>μ</sup>*<sup>1</sup> *<sup>μ</sup>*<sup>2</sup> *...* <sup>T</sup> . Equation (10.58) then gives the sensitivity of the equilibrium population to changes in age-specific mortality. The

**Fig. 10.2** Analysis of the equilibrium of a linear subsidized model for the Australian Academy of Science, based on Pollard (1968). (**a**) The equilibrium age structure of the Academy, assuming recruitment of 6 members per year. (**b**) The sensitivity, to changes in age-specific mortality, of the number of members. (**c**) The sensitivity, to changes in age-specific mortality, of the proportion of members over 70 years old

sensitivity of the total size of the Academy, *<sup>N</sup>*<sup>ˆ</sup> <sup>=</sup> **<sup>1</sup>**T**n**ˆ, calculated using (10.31), is shown in Fig. 10.2b. It shows that increases in mortality reduce *N*ˆ (not surprising), with the greatest effect coming from changes in mortality at ages 48–58.

Because learned societies are often concerned with their age distributions, Pollard (1968) examined the proportion of members over age 70. At equilibrium, this proportion is *<sup>R</sup>*<sup>ˆ</sup> <sup>=</sup> <sup>0</sup>*.*26. The sensitivity *dR/d* <sup>ˆ</sup> *<sup>θ</sup>*T, calculated using (10.33), is shown in Fig. 10.2c. Increases in mortality before age 48 would increase the proportion of members over 70, while increases in mortality after age 48 would decrease it.<sup>11</sup> -

#### *10.4.2 Linear Subsidized Models with Competition for Space*

Recruitment in subsidized populations may be limited by the availability of a resource. Roughgarden et al. (1985; see also Pascual and Caswell 1991) presented a model for a population of sessile organisms, such as barnacles, in which recruitment is limited by available space. Barnacles12 produce larvae that disperse in the plankton for several weeks before settling onto a rock surface or other suitable substrate, after which they no longer move.

Roughgarden's model supposes that settlement is proportional to the free space *F (t)*. Thus the subsidy vector is

$$\mathbf{b}(t) = \begin{pmatrix} \phi F(t) \ 0 \ \cdots \ 0 \end{pmatrix}^{\mathsf{T}},\tag{10.59}$$

where *φ* is the settlement rate per unit of free space, and is determined by the pool of available larvae. The free space is the difference between the total area *A* and the space occupied by the population,

$$F(t) = A - \mathbf{g}^{\mathsf{T}}\mathbf{n}(t)\tag{10.60}$$

where **g** is a vector of stage-specific basal areas. Suppose that all locally-produced larvae are advected away, so that the first row of **A** is zero. Then, substituting (10.60) into (10.59) and rearranging gives

$$\mathbf{n}(t+1) = \mathbf{B}\mathbf{n}(t) + \left(\phi A \; 0 \; \cdots \; 0\right)^{\mathsf{T}} \tag{10.61}$$

<sup>11</sup>It is possible to calculate the average age of the Academy, and its sensitivity, using results to be introduced in Sect. 10.5.4. The response is very similar to that of the proportion over age 70.

<sup>12</sup>The temptation to draw analogies between barnacles and the members of learned academies is almost irresistible.

where

$$\mathbf{B} = \begin{pmatrix} -\phi \mathbf{g}\_1 & -\phi \mathbf{g}\_2 & \cdots & -\phi \mathbf{g}\_s \\ a\_{21} & a\_{22} & \cdots & a\_{2s} \\ \vdots & \vdots & \ddots & \vdots \\ a\_{s1} & a\_{s2} & \cdots & a\_{ss} \end{pmatrix} \tag{10.62}$$

Although it includes competition for space, the model is linear. The equilibrium **n**ˆ of (10.61) is stable if the spectral radius of **B** is less than one.13 The formula (10.58) gives the sensitivity of this equilibrium to changes in the vital rates, the settlement rate, or the individual growth rate. This model might apply to any situation where the recruitment of new individuals depends on the availability of a resource (space, jobs, housing) that can be monopolized by residents.

**Example 4: Intertidal barnacles** Gaines and Roughgarden (1985) modelled a population of the barnacle *Balanus glandula* in central California. In one site (denoted KLM in their paper), they reported age-independent survival with a probability of *Pi* = 0*.*985 per week, *i* = 1*,...,* 52. The growth in basal area of an individual barnacle could be described by *gx* <sup>=</sup> *π(ρx)*2, where *<sup>x</sup>* is age in weeks and *ρ* is the radial growth rate (*ρ* = 0*.*0041 cm/wk). The mean settlement rate was *φ* = 0*.*107. The matrix **B** contains survival probabilities *Pi* on the subdiagonal, terms of the form −*φgi* in the first row, and zeros elsewhere.

The equilibrium population **n**ˆ has an exponential age distribution (Fig. 10.3a). It is scaled here relative to total area, so *A* = 1. The equilibrium proportion of free space is *F*ˆ = 0*.*865.

To calculate sensitivities, let the parameters be age-specific survival probabilities, so that *θ* = *P*<sup>1</sup> ··· *P*<sup>52</sup> . Some of the possible sensitivities are shown in Fig. 10.3. Increasing survival at age *j* (ages *j* = 10*,* 20*,* 40 are shown) reduces the abundance of ages younger than *j* and increases the abundance of ages older than *j* (Fig. 10.3b). A perturbation to a parameter, call it *ξ* , that affects survival at all ages would have the effect

$$\frac{d\hat{\mathbf{n}}}{d\xi} = \frac{d\hat{\mathbf{n}}}{d\theta}\frac{d\theta}{d\xi} = \frac{d\hat{\mathbf{n}}}{d\theta}\mathbf{1} \tag{10.63}$$

where **1** is a vector of ones. An increase in overall survival would reduce the abundance of age classes 1–6 and increase the abundance of older age classes (Fig. 10.3c).

<sup>13</sup>Because **B** contains negative elements, its dominant eigenvalue may be complex or negative, leading to oscillatory approach to the equilibrium.

The sensitivity of **n**ˆ to the larval settlement rate *φ* is obtained from (10.58) by setting *d*vec **B***/dφ* = **0***s*2×1, and

$$\frac{d\mathbf{b}}{d\phi} = \begin{pmatrix} \hat{F} \ 0 \ \cdots \ 0 \end{pmatrix}\_{\mathbf{L}}$$

Not surprisingly, increases in *φ* increase **n**ˆ, with the largest effect on the young age classes (Fig. 10.3d). The sensitivity of **n**ˆ to the radial growth rate *ρ* is obtained by writing

$$\frac{d\mathbf{v}\mathbf{c}\,\mathbf{B}}{d\rho} = \frac{d\mathbf{v}\mathbf{c}\,\mathbf{B}}{d\mathbf{g}^{\mathsf{T}}} \frac{d\mathbf{g}}{d\rho} \tag{10.64}$$

This sensitivity is negative, with the greatest impact on young age classes (Fig. 10.3e).

**Fig. 10.3** Sensitivity analysis of a subsidized population of the intertidal barnacle *Balanus glandula*. (**a**) The equilibrium population **n**ˆ (scaled relative to a unit of area *A* = 1). (**b**) The sensitivity of *bon* ˆ to a change in survival at ages *j* = 10*,* 20*,* 40. (**c**) The sensitivity of **n**ˆ to changes in overall survival at all ages. (**d**) The sensitivity of **n**ˆ to the settlement rate *φ* per unit area. A sensitivity analysis of a subsidized population of the intertidal barnacle *Balanus glandula*.

**Fig. 10.3** (continued) (**e**) The sensitivity of **n**ˆ to the radial growth rate *ρ*. (**f**) The sensitivity of the equilibrium free space *F*ˆ to age-specific survival. (**g**) The sensitivity of *F*ˆ to changes in overall survival, settlement rate, and radial growth rate. Based on data of Gaines and Roughgarden (1985)

Finally, the sensitivity of the equilibrium free space is given by

$$\frac{d\hat{F}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \frac{d\hat{F}}{d\mathbf{n}^{\mathsf{T}}} \frac{d\hat{\mathbf{n}}}{d\boldsymbol{\theta}^{\mathsf{T}}} = -\mathbf{g}^{\mathsf{T}} \frac{d\hat{\mathbf{n}}}{d\boldsymbol{\theta}^{\mathsf{T}}} \tag{10.65}$$

Increases in survival reduce the amount of free space at equilibrium; the effect is largest for changes in survival of young age classes (Fig. 10.3f). Figure 10.3g compares the effect on *F*ˆ of changes in overall survival, settlement, and radial growth rate. It is not surprising that increases in survival or settlement will reduce free space, but perhaps surprising that increases in the radial growth rate actually increase *F*ˆ. -

#### *10.4.3 Density-Dependent Subsidized Models*

Once individuals arrive in the population, they may experience a variety of densitydependent effects, that can be incorporated in a model

$$\mathbf{n}(t+1) = \mathbf{A}\left[\theta, \mathbf{n}(t)\right] \mathbf{n}(t) + \mathbf{b}.\tag{10.66}$$

The sensitivity result (10.58) applies to this model by substituting

$$d\text{vec}\,\mathbf{A} = \frac{\partial \text{vec}\,\mathbf{A}}{\partial \boldsymbol{\theta}^{\mathsf{T}}}d\boldsymbol{\theta} + \frac{\partial \text{vec}\,\mathbf{A}}{\partial \mathbf{n}^{\mathsf{T}}}d\mathbf{\hat{n}}\tag{10.67}$$

into (10.57) and solving for *d***n**ˆ, to obtain

$$\frac{d\hat{\mathbf{n}}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left(\mathbf{I}\_{s} - \mathbf{A} - \left(\hat{\mathbf{n}}^{\mathsf{T}} \otimes \mathbf{I}\_{s}\right) \frac{\partial \text{vec}\,\mathbf{A}}{\partial \mathbf{n}^{\mathsf{T}}}\right)^{-1} \left\{ \left(\hat{\mathbf{n}}^{\mathsf{T}} \otimes \mathbf{I}\_{s}\right) \frac{\partial \text{vec}\,\mathbf{A}}{\partial \boldsymbol{\theta}^{\mathsf{T}}} + \frac{d\mathbf{b}}{d\boldsymbol{\theta}^{\mathsf{T}}}\right\}.\tag{10.68}$$

where **A**, **b**, and all derivatives of **A** and **b** are evaluated at **n**ˆ.

#### **10.5 Stable Structure and Reproductive Value**

The linear model **n***(t* +1*)* = **An***(t)* will, if **A** is primitive, converge to a stable age or stage distribution. But while the dynamics of the population vector **n***(t)* are linear, the dynamics of the *proportional* population structure are nonlinear (Tuljapurkar 1997). We can take advantage of this to analyze the sensitivity of proportional structures by writing them as equilibria of nonlinear maps.

#### *10.5.1 Stable Structure*

The sensitivity of the stable stage distribution has been approached as an eigenvector perturbation problem (e.g., Caswell 1982, 2001; Kirkland and Neumann 1994), but those calculations are complicated. Analysis of the equilibrium of the nonlinear model (10.69) is much easier.

Let **<sup>p</sup>** denote the proportional stage structure vector (**<sup>p</sup>** <sup>≥</sup> 0, **<sup>1</sup>**T**<sup>p</sup>** <sup>=</sup> 1). The dynamics of **p***(t)* satisfy

$$\mathbf{p}(t+1) = \frac{\mathbf{A}\mathbf{p}(t)}{\|\mathbf{A}\mathbf{p}(t)\|}. \tag{10.69}$$

The stable stage distribution is an equilibrium of (10.69); it satisfies

$$
\hat{\mathbf{p}} = \frac{\mathbf{A}\hat{\mathbf{p}}}{\mathbf{1}^{\mathsf{T}}\mathbf{A}\hat{\mathbf{p}}} \tag{10.70}
$$

where the 1-norm can be replaced by **<sup>1</sup>**T**Ap**<sup>ˆ</sup> because **<sup>p</sup>**<sup>ˆ</sup> is non-negative. Differentiating both sides gives

$$d\hat{\mathbf{p}} = \frac{1}{\left(\mathbf{1}^{\mathsf{T}}\mathbf{A}\hat{\mathbf{p}}\right)^{2}} \left[\mathbf{1}^{\mathsf{T}}\mathbf{A}\hat{\mathbf{p}}(d\mathbf{A})\hat{\mathbf{p}} + \mathbf{1}^{\mathsf{T}}\mathbf{A}\hat{\mathbf{p}}\mathbf{A}(d\hat{\mathbf{p}}) - \mathbf{A}\hat{\mathbf{p}}\mathbf{1}^{\mathsf{T}}(d\mathbf{A})\hat{\mathbf{p}} - \mathbf{A}\hat{\mathbf{p}}\mathbf{1}^{\mathsf{T}}\mathbf{A}(d\hat{\mathbf{p}})\right] \tag{10.71}$$

Note that **Ap**<sup>ˆ</sup> <sup>=</sup> *<sup>λ</sup>***p**<sup>ˆ</sup> and **<sup>1</sup>**T**Ap**<sup>ˆ</sup> <sup>=</sup> *<sup>λ</sup>*, where *<sup>λ</sup>* is the dominant eigenvalue of **<sup>A</sup>**. Making these substitutions and applying the vec operator to both sides gives

$$d\lambda \, d\hat{\mathbf{p}} = \left[ \left( \hat{\mathbf{p}}^{\mathsf{T}} \otimes \mathbf{I}\_{s} \right) - \left( \hat{\mathbf{p}}^{\mathsf{T}} \otimes \hat{\mathbf{p}} \mathbf{1}^{\mathsf{T}} \right) \right] d\mathsf{vec} \, \mathbf{A} + \left[ \mathbf{A} - \hat{\mathbf{p}} \mathbf{1}^{\mathsf{T}} \mathbf{A} \right] d\hat{\mathbf{p}} \tag{10.72}$$

Solving for *d***p**ˆ and applying the chain rule gives

$$\frac{d\hat{\mathbf{p}}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left(\lambda \mathbf{I}\_{\mathsf{s}} - \mathbf{A} + \hat{\mathbf{p}} \mathbf{1}^{\mathsf{T}} \mathbf{A}\right)^{-1} \left(\hat{\mathbf{p}}^{\mathsf{T}} \otimes \mathbf{I}\_{\mathsf{s}} - \hat{\mathbf{p}}^{\mathsf{T}} \otimes \hat{\mathbf{p}} \mathbf{1}^{\mathsf{T}}\right) \frac{d\text{vec}\,\mathbf{A}}{d\boldsymbol{\theta}^{\mathsf{T}}} \tag{10.73}$$

**Example 5: A human age distribution** As an example, consider the age distribution of the population of the United States in 1985 (data from Keyfitz and Flieger 1990). These vital rates yield a declining population (*λ* = 0*.*975) and an age distribution skewed towards older ages (Fig. 10.4). Applying (10.73) yields the sensitivity of **p**ˆ to age-specific survival probabilities *Pi* and fertilities *Fi*, where age classes *i* = 1*,...,* 18 correspond to ages 0–5, *...*, 85–90. The overall patterns are familiar from previous sensitivity analyses of stable age distributions (e.g., Caswell 2001, Figure 9.22). Increasing survival probability at a given age increases the relative abundance of the next several age classes, at the expense of younger and older age classes. Increasing fertility at a given age increases the abundance of young age classes at the expense of older age classes. -

#### *10.5.2 Reproductive Value*

A similar approach gives the sensitivity of the reproductive value vector **v**, given by the left eigenvector of **A** corresponding to *λ*. Reproductive value is customarily scaled so that *v*<sup>1</sup> = 1. Scaled in this way, **v** satisfies

$$
\hat{\mathbf{v}}^{\mathsf{T}} = \frac{\hat{\mathbf{v}}^{\mathsf{T}} \mathbf{A}}{\hat{\mathbf{v}}^{\mathsf{T}} \mathbf{A} \mathbf{e}\_{\mathsf{I}}} \tag{10.74}
$$

where **e**<sup>1</sup> is a vector with 1 in the first entry and zeros elsewhere. Differentiating both sides gives

$$d\hat{\mathbf{v}}^{\mathsf{T}} = \frac{1}{\left(\hat{\mathbf{v}}^{\mathsf{T}}\mathbf{A}\mathbf{e}\_{\mathsf{l}}\right)^{2}} \left[\hat{\mathbf{v}}^{\mathsf{T}}\mathbf{A}\mathbf{e}\_{\mathsf{l}}(d\hat{\mathbf{v}}^{\mathsf{T}})\mathbf{A} + \hat{\mathbf{v}}^{\mathsf{T}}\mathbf{A}\mathbf{e}\_{\mathsf{l}}\hat{\mathbf{v}}^{\mathsf{T}}(d\mathbf{A})\right]$$

$$-(d\hat{\mathbf{v}}^{\mathsf{T}})\mathbf{A}\mathbf{e}\_{\mathsf{l}}\hat{\mathbf{v}}^{\mathsf{T}}\mathbf{A} - \hat{\mathbf{v}}^{\mathsf{T}}(d\mathbf{A})\mathbf{e}\_{\mathsf{l}}\hat{\mathbf{v}}^{\mathsf{T}}\mathbf{A}\right] \tag{10.75}$$

**Fig. 10.4** Stable age distribution and sensitivity of stable age distribution to age-specific survival and fertility. (**a**) The stable age distribution. (**b**) The sensitivity of the stable age distribution to changes in survival (*P*5) in age class 5. (**c**) Sensitivity of the stable age distribution to changes in fertility (*F*5) in age class 5. Based on life table data for the United States in 1985 (Keyfitz and Flieger 1990)

But **<sup>v</sup>**ˆT**<sup>A</sup>** <sup>=</sup> *<sup>λ</sup>***v**ˆ<sup>T</sup> and **<sup>v</sup>**ˆT**Ae**<sup>1</sup> <sup>=</sup> *<sup>λ</sup>*. Making these substitutions and applying the vec operator (remembering that vec **<sup>v</sup>**<sup>T</sup> <sup>=</sup> **<sup>v</sup>**) gives

$$
\lambda d\mathbf{v} = \left[ \left( \mathbf{I}\_s \otimes \hat{\mathbf{v}}^\mathsf{T} \right) - \left( \hat{\mathbf{v}} \mathbf{e}\_1^\mathsf{T} \otimes \hat{\mathbf{v}}^\mathsf{T} \right) \right] d\mathbf{v} \mathbf{c} \, \mathbf{A} + \left( \mathbf{A}^\mathsf{T} - \hat{\mathbf{v}} \mathbf{e}\_1^\mathsf{T} \mathbf{A}^\mathsf{T} \right) d\mathbf{v} . \tag{10.76}
$$

Solving for *d***v** and using the chain rule gives

$$\frac{d\hat{\mathbf{v}}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left(\lambda \mathbf{I}\_{s} - \mathbf{A}^{\mathsf{T}} + \hat{\mathbf{v}} \mathbf{e}\_{1}^{\mathsf{T}} \mathbf{A}^{\mathsf{T}}\right)^{-1} \left[ \left(\mathbf{I}\_{s} \otimes \hat{\mathbf{v}}^{\mathsf{T}}\right) - \left(\hat{\mathbf{v}} \mathbf{e}\_{1}^{\mathsf{T}} \otimes \hat{\mathbf{v}}^{\mathsf{T}}\right) \right] \frac{d\text{vec}\,\mathbf{A}}{d\boldsymbol{\theta}^{\mathsf{T}}} \qquad (10.77)$$

In stable population theory, in the calculation of second derivatives of population growth rate (Shyu and Caswell 2014), and in the analysis of multitype branching processes for demographic stochasticity (Caswell and Vindenes 2018), it is necessary to use the sensitivity of **v** subject to the scaling

$$\mathbf{v}^{\mathsf{T}}\mathbf{w} = 1.\tag{10.78}$$

The resulting derivative is

$$\begin{split} \frac{d\mathbf{v}}{d\boldsymbol{\theta}^{\mathsf{T}}} &= \left(\boldsymbol{\lambda}\mathbf{I} - \mathbf{A}^{\mathsf{T}} + \boldsymbol{\lambda}\mathbf{v}\mathbf{w}^{\mathsf{T}}\right)^{-1} \\ &\times \left(\left[\left(\mathbf{I} - \mathbf{v}\mathbf{w}^{\mathsf{T}}\right)\otimes\mathbf{v}^{\mathsf{T}}\right] - \boldsymbol{\lambda}\left(\mathbf{v}\otimes\mathbf{v}^{\mathsf{T}}\right)\frac{d\mathbf{w}}{d\mathbf{v}\mathbf{c}^{\mathsf{T}}\mathbf{A}}\right) \frac{d\mathbf{v}\mathbf{c}\mathbf{A}}{d\boldsymbol{\theta}^{\mathsf{T}}}.\end{split} \tag{10.79}$$

(see Caswell and Vindenes 2018 for derivation).

#### *10.5.3 Sensitivity of the Dependency Ratio*

The dependency ratio characterizes an age distribution by the relative abundance of two groups, one assumed to be dependent and the other productive (Keyfitz and Flieger 1990, p. 32). It is often assumed that persons younger than 15 or older than 65 are dependent on productive individuals between 15 and 65. The dependency ratio is defined as

$$D = \frac{\mathbf{a}^{\mathsf{T}} \hat{\mathbf{p}}}{\mathbf{b}^{\mathsf{T}} \hat{\mathbf{p}}} \tag{10.80}$$

where **a** is a vector with ones for the dependent ages and zeros otherwise, and **b** is its complement. Applying Eq. (10.33) for the sensitivity of a ratio gives

$$\frac{dD}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left(\frac{\mathbf{b}^{\mathsf{T}}\mathbf{\hat{p}}\mathbf{a}^{\mathsf{T}} - \mathbf{a}^{\mathsf{T}}\mathbf{\hat{p}}\mathbf{b}^{\mathsf{T}}}{\left(\mathbf{b}^{\mathsf{T}}\mathbf{\hat{p}}\right)^{2}}\right) \frac{d\boldsymbol{\hat{p}}}{d\boldsymbol{\theta}^{\mathsf{T}}}.\tag{10.81}$$

where *<sup>d</sup>***p**ˆ*/θ*<sup>T</sup> is given by (10.73).

This result can be generalized in several ways. The analysis may be performed separately for the dependent young and the dependent old, by suitable modification of **a** and **b**. These two components are likely to be influenced by different demographic factors and can respond to perturbations in opposite directions. The 0-1 vectors **a** and **b** may be replaced by vectors of weights; e.g., age-specific consumption and age-specific income (Fürnkranz-Prskawetz and Sambt 2014). For an example applied to a population projection for Spain, see Caswell and Sanchez Gassen (2015). The analysis also applies to stage-classified models, provided that dependent and independent stages can be identified. It also applies to nonlinear models, with the stable stage distribution **p**ˆ replaced by the equilibrium population **n**ˆ in (10.81). It can be extended to transient dynamics, where the age distribution, and thus the dependency ratio, varies over time (Caswell 2007), as is the case in population projections (Caswell and Sanchez Gassen 2015). Finally, the sensitivity (10.81) makes it possible to carry out LTRE analyses to decompose differences in dependency ratios into components due to differences in each of the vital rates (see Chaps. 2, 8, and 9).

**Example 5: (cont'd) Dependency ratios in human populations** The United States in 1985 had a set of vital rates leading to a low growth rate (*λ* = 0*.*975), and a relatively low dependency ratio, dominated by the old. Kuwait in 1970, in contrast, had a high growth rate (*λ* = 1*.*210) and one of the highest dependency ratios listed in the compilation of Keyfitz and Flieger (1990), dominated by the young:


where *D*<sup>y</sup> and *D*<sup>o</sup> are the dependency ratios calculated for the young and old separately. The sensitivities of *D*, *D*y, and *D*<sup>o</sup> to changes in age-specific survival and fertility are shown in Fig. 10.5. The responses of *D* to changes in the vital rates differ between the two countries. In the U.S., increases in fertility would reduce *D*. In Kuwait, increases in fertility (especially at young ages) would increase *D*. In the U.S., increases in survival<sup>14</sup> before age 30 reduce *D*; increases after age 30 increase *D*. In Kuwait, increases in survival, except at very young and very old ages, reduce *D*.

Breaking *D* into its young and old components helps to explain these differences. In both countries, there is a crossover in survival effects. Increases in survival at early ages increase *D*<sup>y</sup> and reduce *D*o. At later ages, increases in survival reduce *D*<sup>y</sup> and increase *D*o. Increases in fertility increase *D*<sup>y</sup> and reduce *Do*. In the U.S. population, both these effects are large, with the negative effect on *D*<sup>o</sup> larger than the positive effect on *D*y. In the Kuwaiti population, the positive effect on *D*<sup>y</sup> is much greater than the negative effect on *D*o. -

#### *10.5.4 Sensitivity of Mean Age and Related Quantities*

From an age distribution **p**ˆ, it is possible to compute the mean age of any age-specific property (e.g., production of children, collection of retirement benefits, exposure to toxic chemicals); see Chu (1998, p. 26) for general discussions. The most familiar of these is the mean age of reproduction, which is one measure of generation time (Coale 1972).

Let **f** be a vector of age-specific per-capita fertilities. The age distribution of offspring production is then **f** ◦ **p**ˆ, where ◦ is the Hadamard, or element-by-element product. The mean age of the mothers of these offspring is obtained by normalizing **f** ◦ **p**ˆ to sum to 1 and taking the mean over the resulting distribution,

<sup>14</sup>Or, equivalently, reductions in mortality. For these parameter values, the sensitivity to mortality is approximately the sensitivity to survival with the opposite sign.

**Fig. 10.5** Sensitivity of the dependency ratio *D*, and of its old and young components, to agespecific survival and fertility. Left: calculated from the stable age distribution of the United States in 1985. Right: calculated from the stable age distribution of Kuwait in 1970. (**a**) and (**b**): Sensitivity of *D* to survival (*Pi*) and fertility (*Fi*). (**c**) and (**d**): Sensitivity of the components of *D* to survival. (**e**) and (**f**): Sensitivity of the components of *D* to fertility. Life table data from Keyfitz and Flieger (1990)

226 10 Sensitivity Analysis of Nonlinear Demographic Models

$$\bar{a}\_{\mathbf{f}} = \frac{\mathbf{c}^{\mathsf{T}} \left(\mathbf{f} \circ \hat{\mathbf{p}}\right)}{\mathbf{1}^{\mathsf{T}} \left(\mathbf{f} \circ \hat{\mathbf{p}}\right)}\tag{10.82}$$

where

$$\mathbf{c}\_1 = \begin{pmatrix} 1 \ 2 \ \cdots \ s \end{pmatrix},$$

with *s* as the last age class.

Now differentiate *a*¯**f**, following the now-familiar rules for ratios. The differential of the Hadamard product of two vectors is *d(***a** ◦ **b***)* = D *(***a***)d***b** + D *(***b***)d***a**. The result is

$$\frac{d\bar{a}\_{\mathsf{f}}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left(\frac{\mathbf{1}^{\mathsf{T}} \left(\mathbf{f} \circ \hat{\mathbf{p}}\right) \mathbf{c}^{\mathsf{T}} - \mathbf{c}^{\mathsf{T}} \left(\mathbf{f} \circ \hat{\mathbf{p}}\right) \mathbf{1}^{\mathsf{T}}}{\left(\mathbf{f}^{\mathsf{T}}\hat{\mathbf{p}}\right)^{2}}\right) \left(\mathcal{D}\left(\mathbf{f}\right) \frac{d\hat{\mathbf{p}}}{d\boldsymbol{\theta}^{\mathsf{T}}} + \mathcal{D}\left(\hat{\mathbf{p}}\right) \frac{d\mathbf{f}}{d\boldsymbol{\theta}^{\mathsf{T}}}\right) \tag{10.83}$$

where *<sup>d</sup>***p**ˆ*/dθ*<sup>T</sup> is given by (10.73).

This result can be generalized in several ways. Setting **f** = **1** makes the agespecific property that of simply being alive, and *<sup>a</sup>*¯**<sup>1</sup>** <sup>=</sup> **<sup>c</sup>**T**<sup>1</sup>** is then the mean age of the stable population, the sensitivity of which is

$$\frac{d\bar{a}}{d\theta^{\mathsf{T}}} = \mathbf{c}^{\mathsf{T}} \frac{d\hat{\mathbf{p}}}{d\theta^{\mathsf{T}}} \tag{10.84}$$

The calculations can also be applied to the equilibrium population in a nonlinear model by substituting **n**ˆ for **p**ˆ. They apply directly to stage-classified models with stages defined on an interval scale (e.g., size classes), in which case they give, e.g., the mean size at reproduction. If the stages are not evenly spaced, then **c** would be replaced by

$$\mathbf{c}^{\mathsf{T}} = \left(x\_1 \ x\_2 \ \cdots \ x\_s\right) \tag{10.85}$$

where *xi* is the value associated with stage *i*.

**Example 5: (cont'd) Mean age of reproduction** The mean age of reproduction in the stable age distribution of the United States in 1985 was *a*¯**<sup>f</sup>** = 24*.*02 years (using the mid-points of the 5-year age intervals as the measure of age). The sensitivities of *a*¯**<sup>f</sup>** to changes in age-specific survival and fertility are shown in Fig. 10.6. Increases in survival prior to age 15 reduce *a*¯**f**. Increases in survival after age 45 have almost no effect on *a*¯**f**, because fertility is essentially zero after this age. Between age 15 and age 45, increases in survival increase the mean age of reproduction.

Increases in fertility reduce *a*¯**<sup>f</sup>** if they happen before age 25 and increase *a*¯**<sup>f</sup>** if they happen after age 25. These sensitivities are quite large, although this is somewhat irrelevant since the largest sensitivities are for ages at which fertility is zero and unlikely to be modified. -

**Fig. 10.6** Sensitivity of the mean age at reproduction to changes in age-specific survival and fertility, for the life table of the population of the United States, 1985. (Data from Keyfitz and Flieger 1990)

#### *10.5.5 Sensitivity of Variance in Age*

We can also calculate the sensitivity of the higher moments. For example, the variance in the age at reproduction is

$$V\_{\mathbf{f}} = \overline{a\_{\mathbf{f}}^2} - (\bar{a}\mathfrak{q})^2. \tag{10.86}$$

This variance might, for example, be useful as a measure of the extent of iteroparity. The sensitivity of *V***<sup>f</sup>** to changes in parameters is obtained by writing the first term as

$$\overline{a\_{\mathbf{f}}^{2}} = \frac{(\mathbf{c} \circ \mathbf{c})^{\mathsf{T}} \left(\mathbf{f} \circ \hat{\mathbf{p}}\right)}{\mathbf{1}^{\mathsf{T}} \left(\mathbf{f} \circ \hat{\mathbf{p}}\right)}\tag{10.87}$$

and then differentiating

$$dV\_{\mathbf{f}} = d\left(\overline{a\_{\mathbf{f}}^2}\right) - 2\bar{a}\_{\mathbf{f}}\left(d\bar{a}\_{\mathbf{f}}\right). \tag{10.88}$$

The final result is

$$\frac{d\boldsymbol{V}\_{\mathbf{f}}}{d\boldsymbol{\theta}^{\sf T}} = \left(\frac{\mathbf{1}^{\sf T}(\mathbf{f}\circ\hat{\mathbf{p}})(\mathbf{c}\circ\mathbf{c})^{\sf T} - (\mathbf{c}\circ\mathbf{c})^{\sf T}\left(\mathbf{f}\circ\hat{\mathbf{p}}\right)\mathbf{1}^{\sf T}}{\left(\mathbf{f}^{\sf T}\hat{\mathbf{p}}\right)^{2}}\right)$$

$$\times \left(\mathcal{D}\left(\mathbf{f}\right)\frac{d\hat{\mathbf{p}}}{d\boldsymbol{\theta}^{\sf T}} + \mathcal{D}\left(\hat{\mathbf{p}}\right)\frac{d\mathbf{f}}{d\boldsymbol{\theta}^{\sf T}}\right) - 2\bar{a}\_{\sf I}\frac{d\bar{a}\_{\sf I}}{d\boldsymbol{\theta}^{\sf T}}.\tag{10.89}$$

where *<sup>d</sup>***p**ˆ*/dθ*<sup>T</sup> is given by (10.73) and *da*¯**f***/dθ*<sup>T</sup> is given by (10.83).

#### **10.6 Frequency-Dependent Two-Sex Models**

In sexually reproducing species, a particular sort of nonlinearity arises from the dependence of reproduction on the relative abundance of males and females. This dependence is captured in a marriage function or mating rule (e.g., McFarland 1972; Pollak 1987, 1990) When the vital rates depend only on the relative, rather than the absolute, abundance of males and females, then **A**[*θ,* **n**] is homogeneous of degree 0 in **n**; i.e.,

$$\mathbf{A}[\theta, c\mathbf{n}] = \mathbf{A}[\theta, \mathbf{n}] \qquad \text{for any } c \neq 0. \tag{10.90}$$

Such models have been called frequency-dependent (Caswell and Weeks 1986; Caswell 2001) to distinguish them from density-dependent nonlinear models that do not have this homogeneity property.

Because of the homogeneity of **A**[*θ,* **n**], frequency-dependent models do not converge to an equilibrium density **<sup>n</sup>**ˆ. Instead, there may exist15 a stable equilibrium proportional structure **p**ˆ to which the population will converge, at which point it grows exponentially at a rate *λ* given by the dominant eigenvalue of **A**[*θ,* **p**ˆ]. Thus sensitivity analysis of two-sex models must include both the population structure and the population growth rate.

Note that matrix models that include Mendelian genetics are also homogeneous of degree zero, but it is confusing to call them frequency-dependent, because doing so creates confusion with the genetic phenomenon of frequency-dependent fitness, which is a different thing altogether (de Vries and Caswell 2018).

#### *10.6.1 Sensitivity of the Population Structure*

The equilibrium proportional population structure **p**ˆ satisfies

$$
\hat{\mathbf{p}} = \frac{\mathbf{A}[\boldsymbol{\theta}, \hat{\mathbf{p}}] \ \hat{\mathbf{p}}}{\|\mathbf{A}[\boldsymbol{\theta}, \hat{\mathbf{p}}] \ \hat{\mathbf{p}}\|} \tag{10.91}
$$

where *<sup>p</sup>*ˆ*<sup>i</sup>* <sup>≥</sup> 0 and **<sup>1</sup>**T**p**<sup>ˆ</sup> <sup>=</sup> 1. Differentiating (10.91) gives

$$d\hat{\mathbf{p}} = \frac{\mathbf{1}^{\mathsf{T}} \mathbf{A} \hat{\mathbf{p}} \left[ (d\mathbf{A}) \hat{\mathbf{p}} + \mathbf{A} (d\hat{\mathbf{p}}) \right] - \mathbf{A} \hat{\mathbf{p}} \left[ \mathbf{1}^{\mathsf{T}} (d\mathbf{A}) \hat{\mathbf{p}} + \mathbf{1}^{\mathsf{T}} \mathbf{A} (d\hat{\mathbf{p}}) \right]}{\left( \mathbf{1}^{\mathsf{T}} \mathbf{A} \hat{\mathbf{p}} \right)^{2}}. \tag{10.92}$$

<sup>15</sup>A sufficient, but not necessary, condition for the existence of an equilibrium is that **A** cannot map a nonzero vector **n** directly to zero; necessary conditions are more difficult (Nussbaum 1988, 1989). See also Martcheva (1999).

Making the substitutions **Ap**<sup>ˆ</sup> <sup>=</sup> *<sup>λ</sup>***p**<sup>ˆ</sup> and **<sup>1</sup>**T**Ap**<sup>ˆ</sup> <sup>=</sup> *<sup>λ</sup>* and rearranging gives

$$
\lambda d\hat{\mathbf{p}} = (d\mathbf{A})\hat{\mathbf{p}} + \mathbf{A}(d\hat{\mathbf{p}}) - \hat{\mathbf{p}}\mathbf{1}^{\mathsf{T}}(d\mathbf{A})\hat{\mathbf{p}} - \hat{\mathbf{p}}\mathbf{1}^{\mathsf{T}}\mathbf{A}(d\hat{\mathbf{p}}).\tag{10.93}
$$

Applying the vec operator to both sides, expanding *d*vec **A**, invoking the chain rule, and solving for *<sup>d</sup>***p**ˆ*/dθ*<sup>T</sup> gives

$$\frac{d\hat{\mathbf{p}}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left[\lambda\mathbf{I}\_{s} - \mathbf{A} + \hat{\mathbf{p}}\mathbf{1}^{\mathsf{T}}\mathbf{A} - \left[\hat{\mathbf{p}}^{\mathsf{T}}\otimes\left(\mathbf{I}\_{s} - \hat{\mathbf{p}}\mathbf{1}^{\mathsf{T}}\right)\right] \frac{\partial\text{vec}\,\mathbf{A}}{\partial\mathbf{p}^{\mathsf{T}}}\right]^{-1}$$

$$\times \left[\hat{\mathbf{p}}^{\mathsf{T}}\otimes\left(\mathbf{I}\_{s} - \hat{\mathbf{p}}\mathbf{1}^{\mathsf{T}}\right)\right] \frac{\partial\text{vec}\,\mathbf{A}}{\partial\boldsymbol{\theta}^{\mathsf{T}}} \qquad (10.94)$$

where **A** and all derivatives are evaluated at **p**ˆ. Note that (10.94) differs from the expression (10.73) for the stable stage distribution in the linear model only in the term involving *∂*vec **A***/∂***p**T, which of course is zero in the linear model.

#### *10.6.2 Population Growth Rate in Two-Sex Models*

Because a population with the equilibrium structure grows exponentially, I once suggested treating **A**[*θ,* **p**ˆ] as a constant matrix and applying eigenvalue sensitivity analysis to it, in order to examine life history evolution in 2-sex models (Caswell 2001, p. 577). This was incorrect, because it ignored the effect of parameter changes on **A** through their effects on the equilibrium **p**ˆ. A correct calculation obtains the sensitivity of *λ* including effects of parameters on both **A** and **p**ˆ.

Note that **p**ˆ is a right eigenvector of **A**[*θ,* **p**ˆ] corresponding to *λ*. Let **v** be the corresponding left eigenvector, where **<sup>v</sup>**T**A**[*θ,* **<sup>p</sup>**ˆ] = *<sup>λ</sup>***v**<sup>T</sup> and **<sup>v</sup>**T**p**<sup>ˆ</sup> <sup>=</sup> 1. Then

$$d\lambda = \mathbf{v}^{\mathsf{T}}(d\mathbf{A})\hat{\mathbf{p}}\tag{10.95}$$

Caswell (1978). Apply the vec operator and Roth's theorem to get

$$d\lambda = \left(\hat{\mathbf{p}}^{\mathsf{T}} \otimes \mathbf{v}^{\mathsf{T}}\right) d\mathsf{vec} \,\mathbf{A}.\tag{10.96}$$

Expanding *d*vec **A** gives

$$\frac{d\boldsymbol{\lambda}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left(\hat{\mathbf{p}}^{\mathsf{T}} \otimes \mathbf{v}^{\mathsf{T}}\right) \left[\frac{\partial \text{vec}\,\mathbf{A}}{\partial \boldsymbol{\theta}^{\mathsf{T}}} + \frac{\partial \text{vec}\,\mathbf{A}}{\partial \hat{\mathbf{p}}^{\mathsf{T}}} \frac{d\hat{\mathbf{p}}}{d\boldsymbol{\theta}^{\mathsf{T}}}\right] \tag{10.97}$$

where **A**, **v**, and the derivatives of **A** are all evaluated at the equilibrium **p**ˆ, and *<sup>d</sup>***p**ˆ*/dθ*<sup>T</sup> is given by (10.94).

Note that *λ* is the invasion exponent for this model, and thus the sensitivity of *λ* to a parameter gives the selection gradient on that parameter. Tuljapurkar et al.

(2007) used this fact to explore the effect of male fertility patterns on the evolution of aging; the sensitivity (10.97) could be used to generalize such results. Recent work by Shyu has coupled these calculations to the methods of adaptive dynamics to examine the evolution of sex ratios (Shyu and Caswell 2016a,b).

Although two-sex models are an important case of homogeneous models, they are not the only case. Keyfitz's (1972) interpretation of the Easterlin hypothesis describes fertility as dependent on only the relative, not absolute, size of a cohort. A model based on this premise would be frequency-dependent (homogeneous) and would lead to an exponentially growing population to which (10.97) would be applicable.

**Example 6: A two-sex model for passerine birds** Legendre et al. (1999) used a frequency-dependent two-sex model to study the introductions of passerine birds to New Zealand. The life cycle includes two age classes (first year and older) for females and for males. The life cycle graph is shown in Fig. 10.7. The numbers of females and males are *Nf* = *n*<sup>1</sup> + *n*<sup>2</sup> and *Nm* = *n*<sup>3</sup> + *n*4, respectively.

Because passerines are typically monogamous within a breeding season, and assuming that mating is indiscriminate with respect to age, Legendre et al. (1999) used as a mating function

$$B(\mathbf{n}) = \min\left(N\_f, N\_m\right),\tag{10.98}$$

giving the number of matings as a function of the number of males and females. The per-capita fertility of a female of age-class *i* is the number of matings divided by the number of females and multiplied by the number of surviving offspring per mating.

$$F(\mathbf{n}) = \frac{\sigma\_0 \phi\_l B(\mathbf{n})}{N\_f} \tag{10.99}$$

$$\sigma = \begin{cases} \sigma\_0 \phi\_l \frac{N\_m}{N\_f} & N\_f \ge N\_m \\ \sigma\_0 \phi & N\_f < N\_m \end{cases} \tag{10.100}$$

where *σ*<sup>0</sup> is the probability of survival from fledging to age 1 and *φi* is the clutch size of age class *i*. When males are the scarcer sex (the avian equivalent of a marriage squeeze) fertility is proportional to the ratio of males to females. When females are the scarcer sex, all females are mated and fertility depends only on fecundity and neonatal survival.

Births are allocated to females and males according to a primary sex ratio *ρ* which gives the proportion female. The resulting two-sex projection matrix is

$$\mathbf{A}[\mathbf{n}] = \begin{pmatrix} \rho F\_1(\mathbf{n}) & \rho F\_2(\mathbf{n}) & 0 & 0\\ \sigma\_1 & \sigma\_2 & 0 & 0\\ \hline (1-\rho)F\_1(\mathbf{n}) \ (1-\rho)F\_2(\mathbf{n}) & 0 & 0\\ 0 & 0 & \sigma\_3 \ \sigma\_4 \end{pmatrix} \tag{10.101}$$

Legendre et al. (1999) assigned typical values for passerine birds of *σ*<sup>0</sup> = 0*.*2, *φi* = 7, and *ρ* = 0*.*5. They set male and female survival equal (*σ*<sup>1</sup> = *σ*<sup>3</sup> = 0*.*35, *σ*<sup>2</sup> = *σ*<sup>4</sup> = 0*.*4), but this is a pathological special case in this model, so instead I consider two cases, one in which male mortality is higher than female mortality, and one in which the difference is reversed.16 The survival probabilities and equilibrium population structures are

$$\boldsymbol{\sigma} = \begin{pmatrix} 0.35\\ 0.5\\ 0.25\\ 0.4 \end{pmatrix} \qquad \hat{\mathbf{p}} = \begin{pmatrix} 0.320\\ 0.226\\ 0.320\\ 0.134 \end{pmatrix} \tag{10.102}$$

$$\boldsymbol{\sigma} = \begin{pmatrix} 0.25 \\ 0.4 \\ 0.35 \\ 0.5 \end{pmatrix} \qquad \hat{\mathbf{p}} = \begin{pmatrix} 0.320 \\ 0.134 \\ 0.320 \\ 0.226 \end{pmatrix} \tag{10.103}$$

The elasticities of **p**ˆ to each of the parameters, calculated from (10.94), are shown in Table 10.1. Regardless of which sex is scarcer, increasing neonatal survival increases the proportion of young, at the expense of the proportion of adults, in both sexes. Increasing the sex ratio *ρ* increases the proportion of females at the expense of males. Increasing female survival (*σ*<sup>1</sup> or *σ*2) increases the proportion of adult females at the expense of all other stages; increasing male survival has the opposite

<sup>16</sup>In a survey of the literature, adult mortality for female passerines exceeded that for males in 21 out of 28 cases (Promislow et al. 1992). Birds differ from mammals in this respect.

Females rare


*p*ˆ<sup>4</sup> −0*.*664 −0*.*428 −0*.*226 −0*.*229 0*.*669 0*.*450 −0*.*389 −0*.*275

Stage *σ*<sup>0</sup> *ρ σ*<sup>1</sup> *σ*<sup>2</sup> *σ*<sup>3</sup> *σ*<sup>4</sup> *φ*<sup>1</sup> *φ*<sup>2</sup> *p*ˆ<sup>1</sup> 0*.*455 1*.*547 0*.*000 0*.*000 −0*.*226 −0*.*229 0*.*320 0*.*135 *p*ˆ<sup>2</sup> −0*.*664 0*.*428 0*.*669 0*.*450 −0*.*226 −0*.*229 −0*.*467 −0*.*197 *p*ˆ<sup>3</sup> 0*.*455 −0*.*453 0*.*000 0*.*000 −0*.*226 −0*.*229 0*.*320 0*.*135 *p*ˆ<sup>4</sup> −0*.*890 −1*.*799 −0*.*398 −0*.*268 0*.*774 0*.*783 −0*.*627 −0*.*264

**Table 10.1** Elasticity of **p**ˆ to parameters in two-sex model for passerine birds, under two mortality scenarios. When male mortality is greater than female mortality, males are rarer than females and

effect. However, when females are rare, increasing female survival has no effect on the proportion of juveniles. When males are rare, increases in male survival have no effect on the proportion of juveniles. Increasing fecundity increases the proportion of juveniles, at the expense of adults, in both sexes and for either mortality pattern.

The elasticity of the population growth rate *λ* at equilibrium is shown in Table 10.2, and is compared to the naive calculation that treats **A**[*θ,* **p**ˆ] as a fixed matrix. When males are rare, so that fertility is limited by the mating function, the naive calculations are dramatically wrong. When calculated correctly, increases in the primary sex ratio *ρ* reduce *λ*, because they reduce the availability of males. Increases in female survival have no effect on *λ*, because the extra females produced have no opportunity to reproduce. Increases in male survival increase *λ* because they increase female fertility. In each case, the naive calculation leads, incorrectly, to the opposite conclusion.

When females are rare (which renders the model linear and female-dominant at equilibrium), the correct and the naive calculations agree. This is a consequence of using the minimum as a birth function. Some preliminary calculations using the harmonic mean birth function,

$$B(\mathbf{n}) = \frac{2N\_f N\_m}{N\_f + N\_m},\tag{10.104}$$

in which both males and females influence fertility at all population structures, suggest that the naive elasticity calculations are always incorrect.

Sometimes the correct calculations lead to apparent paradoxes. Jenouvrier et al. (2010) developed a two-sex model for the Emperor penguin. It was a periodic model, with phases defined by events within the breeding cycle (cf. Chap. 8), and included a


mating function applied to adults at the breeding colony. Because Emperor penguins breed, and share parental care, in the midst of the Antarctic winter,17 they must be strictly monogamous, and hence Jenouvrier used the minimum as a mating function.

Analysis of the equilibrium growth rate revealed that the sensitivity of *λ* to adult female survival was negative. This is impossible in a linear model, but happens in this frequency-dependent model because increasing adult female survival increases the proportion of females (already greater than the proportion of males) and thus decreases mating probability. The negative effect of reduced mating overwhelms the positive effect of improved adult survival; the net result is a reduction in population growth rate; see Jenouvrier et al. (2010) for details. -

#### *10.6.3 The Birth Matrix-Mating Rule Model*

Pollak (1987, 1990) introduced a powerful conceptual approach to two-sex models, which he called the birth matrix-mating rule (BMMR) model. This model separates the processes of mating, birth, and life cycle stage transitions, and treats them as a periodic process. When generalized to stage-structured models, it contains three main components:


<sup>17</sup>Dramatically portrayed in the movie, *March of the Penguins*.

A matrix version of the BMMR has recently been developed, using a novel continuous-time formulation of periodic matrix models (Shyu and Caswell 2018). The mating, birth, and transition processes are described, respectively, by matrices **U**, **B**, and **T**. To explore the theoretical consequences of two-sex reproduction, the matrices are parameterized in terms of continuous-time rates rather than discretetime probabilities. In continuous time, the periodic matrix product that would describe such a process in discrete time converges to a sum of the rate matrices. The dynamics of the population are given by

$$\frac{d\mathbf{n}(t)}{dt} = \mathbf{A}\left[\mathbf{n}(t)\right] \ \mathbf{n}(t) \tag{10.105}$$

where

$$\mathbf{A}\left[\mathbf{n}(t)\right] = \frac{1}{3}\left(\mathbf{T} + \mathbf{B} + \mathbf{U}[\mathbf{n}(t)]\right)\mathbf{n}(t)\tag{10.106}$$

That is, the projection matrix is the mean of the three component matrices, and is nonlinear because of the dependence of union formation (the matrix **U**) on **n**. Shyu and Caswell (2016a,b, 2018) explore this model in the context of sex ratio evolution and of sex-biased harvesting, deriving the sensitivity of the population growth rate as a measure of the selection gradient.

#### **10.7 Sensitivity of Population Cycles**

Equilibria are not the only attractors relevant in nature (e.g., Clutton-Brock et al. 1997) or the laboratory (Cushing et al. 2003). Cycles, invariant loops, and strange attractors also occur, and are sensitive to changes in parameters. This section examines the sensitivity of cycles.

#### *10.7.1 Sensitivity of the Population Vector*

A *k*-cycle is a sequence of population vectors **n**ˆ <sup>1</sup>*,...,* **n**ˆ *<sup>k</sup>*, satisfying

$$
\hat{\mathbf{n}}\_{l+1} = \mathbf{A} \begin{bmatrix} \theta, \hat{\mathbf{n}}\_l \end{bmatrix} \hat{\mathbf{n}}\_l \qquad i = 1, \ldots, k - 1
$$

$$
\hat{\mathbf{n}}\_l = \mathbf{A} \begin{bmatrix} \theta, \hat{\mathbf{n}}\_k \end{bmatrix} \hat{\mathbf{n}}\_k. \tag{10.107}
$$

A change in parameters will modify each point in the cycle; the first goal of perturbation analysis is thus to find the sensitivities

$$\frac{d\hat{\mathbf{n}}\_{\mathrm{l}}}{d\boldsymbol{\theta}^{\mathsf{T}}}, \ldots, \frac{d\hat{\mathbf{n}}\_{\mathrm{l}}}{d\boldsymbol{\theta}^{\mathsf{T}}}.\tag{10.108}$$

The following is the derivation of these sensitivities for a 2-cycle. The extension to cycles of arbitrary length will follow. To simplify notation, define

$$\mathbf{A}\_{l} \equiv \mathbf{A} \begin{bmatrix} \theta, \hat{\mathbf{n}}\_{l} \end{bmatrix}. \tag{10.109}$$

The 2-cycle satisfies

$$
\hat{\mathbf{n}}\_{\parallel} = \mathbf{A}\_2 \hat{\mathbf{n}}\_2 \tag{10.110}
$$

$$
\hat{\mathbf{n}}\_2 = \mathbf{A}\_1 \hat{\mathbf{n}}\_1 \tag{10.111}
$$

Differentiating both equations, applying the vec operator, and expanding *d*vec **A***i/dθ*<sup>T</sup> yields a system of equations

$$\begin{aligned} \frac{d\hat{\mathbf{h}}\_{1}}{d\boldsymbol{\theta}^{\mathsf{T}}} &= \left(\hat{\mathbf{n}}\_{2}^{\mathsf{T}} \otimes \mathbf{I}\_{s}\right) \frac{\partial \text{vec}\,\mathbf{A}\_{2}}{\partial \boldsymbol{\theta}^{\mathsf{T}}} + \left(\hat{\mathbf{n}}\_{2}^{\mathsf{T}} \otimes \mathbf{I}\_{s}\right) \frac{\partial \text{vec}\,\mathbf{A}\_{2}}{\partial \mathbf{n}\_{2}^{\mathsf{T}}} \left(\frac{d\hat{\mathbf{n}}\_{2}}{d\boldsymbol{\theta}^{\mathsf{T}}}\right) \\ &+ \mathbf{A}\_{2} \left(\frac{d\hat{\mathbf{n}}\_{2}}{d\boldsymbol{\theta}^{\mathsf{T}}}\right) \\ \frac{d\hat{\mathbf{n}}\_{2}}{d\boldsymbol{\theta}^{\mathsf{T}}} &= \left(\hat{\mathbf{n}}\_{1}^{\mathsf{T}} \otimes \mathbf{I}\_{s}\right) \frac{\partial \text{vec}\,\mathbf{A}\_{1}}{\partial \boldsymbol{\theta}^{\mathsf{T}}} + \left(\hat{\mathbf{n}}\_{1}^{\mathsf{T}} \otimes \mathbf{I}\_{s}\right) \frac{\partial \text{vec}\,\mathbf{A}\_{1}}{\partial \mathbf{n}\_{1}^{\mathsf{T}}} \left(\frac{d\hat{\mathbf{n}}\_{1}}{d\boldsymbol{\theta}^{\mathsf{T}}}\right) \\ &+ \mathbf{A}\_{1} \left(\frac{d\hat{\mathbf{n}}\_{1}}{d\boldsymbol{\theta}^{\mathsf{T}}}\right) \end{aligned} (10.113)$$

This system can be written in block matrix form. Define **<sup>H</sup>***<sup>i</sup>* <sup>≡</sup> **<sup>n</sup>**ˆ<sup>T</sup> *<sup>i</sup>* ⊗ **I***s*. Then

$$\begin{split} \frac{d}{d\boldsymbol{\theta}^{\mathsf{T}}} \left( \frac{\hat{\mathbf{h}}\_{1}}{\hat{\mathbf{n}}\_{2}} \right) &= \left( \frac{\boldsymbol{0}}{\mathbf{H}\_{1}} \frac{\mathbf{H}\_{2}}{\boldsymbol{0}} \right) \left( \frac{\frac{\partial \text{vec} \, \mathbf{A}\_{1}}{\partial \boldsymbol{\theta}^{\mathsf{T}}}}{\frac{\partial \text{vec} \, \mathbf{A}\_{2}}{\partial \boldsymbol{\theta}^{\mathsf{T}}}} \right) \\\\ &+ \left[ \left( \frac{\boldsymbol{0}}{\mathbf{H}\_{1}} \frac{\mathbf{H}\_{2}}{\boldsymbol{0}} \right) \left( \frac{\frac{\partial \text{vec} \, \mathbf{A}\_{1}}{\partial \mathbf{n}\_{1}^{\mathsf{T}}}}{\boldsymbol{0}} \right) \frac{\boldsymbol{0}}{\frac{\partial \text{vec} \, \mathbf{A}\_{2}}{\partial \mathbf{n}\_{2}^{\mathsf{T}}}} \right) + \left( \frac{\boldsymbol{0}}{\mathbf{A}\_{1}} \frac{\mathbf{A}\_{2}}{\boldsymbol{0}} \right) \right] \\\\ &\times \frac{d}{d\boldsymbol{\theta}^{\mathsf{T}}} \left( \frac{\hat{\mathbf{n}}\_{1}}{\hat{\mathbf{n}}\_{2}} \right) \tag{10.114} \end{split}$$

Solving for the sensitivities gives

$$\frac{d}{d\boldsymbol{\theta}^{\mathsf{T}}} \left(\frac{\hat{\mathbf{n}}\_{1}}{\hat{\mathbf{n}}\_{2}}\right) = \left[\mathbf{I}\_{2s} - \left(\frac{\boldsymbol{\theta} \,\, \middle|\, \mathbf{H}\_{2}}{\mathbf{H}\_{1} \, \middle|\, \boldsymbol{0}}\right) \left(\frac{\frac{\partial \text{vec}\,\mathbf{A}\_{1}}{\partial \mathbf{n}\_{1}^{\mathsf{T}}} \, \middle|\, \boldsymbol{0}}{\boldsymbol{0}}\right) \right.$$

$$- \left(\frac{\boldsymbol{\theta} \,\, \middle|\, \mathbf{A}\_{2}}{\mathbf{A}\_{1} \, \middle|\, \boldsymbol{0}}\right)^{-1} \left(\frac{\boldsymbol{0} \,\, \middle|\, \mathbf{H}\_{2}}{\mathbf{H}\_{1} \, \middle|\, \boldsymbol{0}}\right) \left(\frac{\frac{\partial \text{vec}\,\mathbf{A}\_{1}}{\partial \boldsymbol{\theta}^{\mathsf{T}}}}{\frac{\partial \text{vec}\,\, \mathbf{A}\_{2}}{\partial \boldsymbol{\theta}^{\mathsf{T}}}}\right) \quad (10.115)$$

where the matrices **A***<sup>i</sup>* and the derivatives of **A***<sup>i</sup>* are all evaluated at **n**ˆ*i*. The analogy with (10.16) is apparent.

This calculation can be extended to cycles of any period, in terms of block matrices as in (10.115). The pattern of the block matrices is clear from a 3-cycle. Define the following matrices:

$$\mathbb{N} = \begin{pmatrix} \hat{\mathbf{n}}\_1 \\ \hat{\mathbf{n}}\_2 \\ \hat{\mathbf{n}}\_3 \end{pmatrix} \tag{10.116}$$

$$\mathbb{A} = \begin{pmatrix} 0 & 0 & \mathbf{A}\_3 \\ \mathbf{A}\_1 & 0 & 0 \\ 0 & \mathbf{A}\_2 & 0 \end{pmatrix} \tag{10.117}$$

$$
\mathbb{H} = \begin{pmatrix} 0 & 0 & \mathbf{H}\_3 \\ \mathbf{H}\_1 & 0 & 0 \\ 0 & \mathbf{H}\_2 & 0 \end{pmatrix} \tag{10.118}
$$

$$\mathbb{C} = \begin{pmatrix} \frac{\partial \text{vec} \, \mathbf{A}\_1}{\partial \mathbf{n}\_1^\mathsf{T}} & 0 & 0\\ 0 & \frac{\partial \text{vec} \, \mathbf{A}\_2}{\partial \mathbf{n}\_2^\mathsf{T}} & 0\\ 0 & 0 & \frac{\partial \text{vec} \, \mathbf{A}\_3}{\partial \mathbf{n}\_3^\mathsf{T}} \end{pmatrix} \tag{10.119}$$

$$\mathbb{D} = \begin{pmatrix} \frac{\partial \text{vec} \, \mathbf{A}\_1}{\partial \boldsymbol{\theta}^\mathsf{T}}\\ \frac{\partial \text{vec} \, \mathbf{A}\_1}{\partial \boldsymbol{\theta}^\mathsf{T}}\\ \frac{\partial \text{vec} \, \mathbf{A}\_1}{\partial \boldsymbol{\theta}^\mathsf{T}} \end{pmatrix}. \tag{10.120}$$

In terms of these matrices, the sensitivity of each point in the 3-cycle is given by

$$\frac{d\mathbb{N}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left[\mathbf{I}\_{3s} - \mathbb{A} - \mathbb{H}\mathbb{C}\right]^{-1}\mathbb{H}\mathbb{D}.\tag{10.121}$$

#### *10.7.2 Sensitivity of Weighted Densities and Time Averages*

The matrix *d*N*/dθ*<sup>T</sup> contains the sensitivity of every stage to every parameter at every point in the cycle. This potential overload of information can be simplified by calculating the sensitivities of weighted densities and/or time averages over the cycle. To do this, it is convenient to write the points in the cycle as an array (of dimension *s* × *k*)

$$\mathbf{G} = \left(\hat{\mathbf{n}}\_1 \,\hat{\mathbf{n}}\_2 \,\cdots \,\hat{\mathbf{n}}\_k\right). \tag{10.122}$$

The block vector N is

$$\mathbb{N} = \text{vec } \mathbf{G}.\tag{10.123}$$

**Weighted densities**. Let **<sup>c</sup>** be a vector of weights, and let *<sup>N</sup>*ˆ*<sup>i</sup>* <sup>=</sup> **<sup>c</sup>**T**n**ˆ*<sup>i</sup>* be the (scalar) weighted density at the *i*th point on the cycle. Then write

$$
\hat{\mathbf{n}} = \begin{pmatrix} \hat{N}\_1 \\ \vdots \\ \hat{N}\_k \end{pmatrix} \tag{10.124}
$$

The vector **<sup>n</sup>**<sup>ˆ</sup> can be calculated from <sup>N</sup> as

$$\begin{aligned} \hat{\mathbf{n}} &= \left( \mathbf{c}^{\mathsf{T}} \hat{\mathbf{n}}\_{1} \cdots \mathbf{c}^{\mathsf{T}} \hat{\mathbf{n}}\_{k} \right)^{\mathsf{T}} \\ &= \text{vec} \left( \mathbf{c}^{\mathsf{T}} \mathbf{G} \right) \\ &= \left( \mathbf{I}\_{k} \otimes \mathbf{c}^{\mathsf{T}} \right) \text{vec} \, \mathbf{G} \end{aligned}$$

$$= \left(\mathbf{I}\_k \otimes \mathbf{c}^\mathsf{T}\right) \mathbb{N} \qquad \text{dimension} = k \times 1. \tag{10.125}$$

**Time-averaged population vector**. Let **<sup>b</sup>** be a probability vector (*bi* <sup>≥</sup> 0, **<sup>1</sup>**T**<sup>b</sup>** <sup>=</sup> 1) and define the time-averaged population vector as

$$
\bar{\mathbf{n}} = \sum\_{l=1}^{k} b\_l \hat{\mathbf{n}}\_l. \tag{10.126}
$$

Then

$$\begin{aligned} \bar{\mathbf{n}} &= \mathbf{G}\mathbf{b} \\ &= \left(\mathbf{b}^{\mathsf{T}} \otimes \mathbf{I}\_{s}\right) \text{vec}\,\mathbf{G} \\ &= \left(\mathbf{b}^{\mathsf{T}} \otimes \mathbf{I}\_{s}\right) \mathbb{N} \qquad \text{dimension} = s \times 1 \end{aligned} \tag{10.127}$$

**Time-averaged weighted density**. Taking the time average of the *N*ˆ*<sup>i</sup>* gives

$$\begin{aligned} \bar{N} &= \sum\_{l} b\_{l} \mathbf{c}^{\mathsf{T}} \hat{\mathbf{n}}\_{l} \\ &= \mathbf{c}^{\mathsf{T}} \mathbf{G} \mathbf{b} \\ &= \left( \mathbf{b}^{\mathsf{T}} \otimes \mathbf{c}^{\mathsf{T}} \right) \mathbb{N} \end{aligned} \tag{10.128}$$

Thus the sensitivities of the weighted densities, the time-averaged population, and the time-averaged weighted density are obtained by differentiating (10.125), (10.127), and (10.128) as

$$\frac{d\hat{\mathbf{n}}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left(\mathbf{I}\_{k} \otimes \mathbf{c}^{\mathsf{T}}\right) \frac{d\mathbb{N}}{d\boldsymbol{\theta}^{\mathsf{T}}} \tag{10.129}$$

$$\frac{d\bar{\mathbf{n}}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left(\mathbf{b}^{\mathsf{T}} \otimes \mathbf{I}\_{\mathsf{s}}\right) \frac{d\mathbb{N}}{d\boldsymbol{\theta}^{\mathsf{T}}} \tag{10.130}$$

$$\frac{d\bar{N}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left(\mathbf{b}^{\mathsf{T}} \otimes \mathbf{c}^{\mathsf{T}}\right) \frac{d\mathbb{N}}{d\boldsymbol{\theta}^{\mathsf{T}}} \tag{10.131}$$

where *d*N*/θ*<sup>T</sup> is given by (10.121).

**Example 7 A 2-cycle in the** *Tribolium* **model** A series of experiments on *Tribolium* reported by Dennis et al. (1995) produced stable 2-cycles by experimentally manipulating the adult mortality *μa*. Using the model in Example 2 and the estimated parameters

$$\begin{aligned} b &= 11.677 \\ c\_{ea} &= 1.100 \times 10^{-2} \\ c\_{el} &= 9.3 \times 10^{-3} \\ c\_{pa} &= 1.78 \times 10^{-2} \\ \mu\_a &= 1.108 \times 10^{-1} \\ \mu\_l &= \\$.129 \times 10^{-1} \end{aligned}$$

(Dennis et al. 1995, Table 1) leads to a 2-cycle

$$
\hat{\mathbf{n}}\_1 = \begin{pmatrix} 325.3 \\ 8.9 \\ 118.5 \end{pmatrix} \qquad \hat{\mathbf{n}}\_2 = \begin{pmatrix} 18.2 \\ 158.4 \\ 106.4 \end{pmatrix}, \tag{10.132}
$$

in which the population oscillates between a state dominated by larvae and adults and a state dominated by pupae and adults.

As an example of the rich sensitivity analyses possible for even such a simple model, consider the elasticity of the population vector **n**ˆ*i*, of the total population *<sup>N</sup>*ˆ*<sup>i</sup>* <sup>=</sup> **<sup>1</sup>**T**n**ˆ*i*, of the total population respiration *<sup>R</sup>*ˆ*<sup>i</sup>* <sup>=</sup> **<sup>c</sup>**T**n**ˆ*<sup>i</sup>* (with **<sup>c</sup>** the vector of stagespecific respiration rates from Example 2), and of the time averages **n**¯, *N*¯ , and *R*¯. The results are collected in Fig. 10.8.

First, the elasticities of the **n**ˆ*<sup>i</sup>* differ from stage to stage and from one point on the cycle to another (Fig. 10.8a). Increases in fecundity, for example, increase the density of larvae and reduce the density of pupae in **n**ˆ 1, but have the opposite effects in **n**ˆ 2. The elasticities to *b*, *cea*, and *cel* are much larger than those to the other parameters (cf. the elasticities of the equilibrium *n*ˆ in Fig. 10.1).

The elasticities of total population are similar at the two points in the cycle (Fig. 10.8b), except that larval mortality *μl* has a large negative effect on *N*ˆ2, but only a small effect on *N*ˆ1. The elasticities of total respiration *R*ˆ*i*, however, are different at the two points in the cycle (Fig. 10.8c).

The elasticities of the time-averaged population vector **n**¯ (Fig. 10.8d) are similar to those of the equilibrium vector in Fig. 10.1 (although they need not be). This pattern is not predictable from the patterns of the elasticities of the population vectors **n**ˆ <sup>1</sup> and **n**ˆ <sup>2</sup> (Fig. 10.8a).

Finally, the elasticities of the time averages, *N*¯ and *R*¯, of the weighted densities are similar to each other and to the elasticities of the time-averaged population **n**¯.

The sensitivity analysis of cycles thus depends very much on the dependent variables of interest. The matrix *d*N*/dθ*<sup>T</sup> (Fig. 10.8a) contains 36 pieces of information: the effects of 6 parameters on 3 stages at 2 points in the cycle. A focus on weighted density reduces this to 12 (Fig. 10.8b,c), but the results may depend very much on the particular weighting vector chosen. A focus on time averages reduces the information from 36 to 18 numbers (Fig. 10.8d), and the response of the timeaveraged weighted densities finally are described by just 6 numbers. The good news

**Fig. 10.8** Analysis of a 2-cycle in the *Tribolium* model. (**a**) Elasticity of the density of each stage, with respect to each parameter, at **n**ˆ <sup>1</sup> and **n**ˆ 2. (**b**) Elasticity of the total population *N*ˆ at each point in the cycle. (**c**) Elasticity of the total respiration *R*ˆ at each point in the cycle. (**d**) Elasticity of the time-averaged population **n**¯. (**e**) Elasticity of the time-averaged total population *N*¯ and the time-averaged total respiration *R*¯

is that Eqs. (10.121), (10.125), (10.127), and (10.128) make it easy to compute all these sensitivities. -

#### *10.7.3 Sensitivity of Temporal Variance in Density*

The variance over a cycle in a weighted density *N*ˆ can be written

$$V(\hat{N}) = E(\hat{N}^2) - \left[E(\hat{N})\right]^2\tag{10.133}$$

where *E(N )*<sup>ˆ</sup> <sup>=</sup> *<sup>N</sup>*¯ <sup>=</sup> **<sup>c</sup>**T**Gb** and

$$E(\hat{N}^2) = \sum\_{l=1}^{k} b\_l \left(\mathbf{c}^\mathsf{T} \hat{\mathbf{n}}\_l\right)^2 \tag{10.134}$$

$$\mathbf{h} = (\mathbf{c} \circ \mathbf{c})^{\mathsf{T}} (\mathbf{G} \circ \mathbf{G}) \mathbf{b} \tag{10.135}$$

Taking the differential of *E(N*ˆ <sup>2</sup>*)* and applying the vec operator gives

$$dE(\hat{N}^2) = 2\left[\mathbf{b}^\mathsf{T} \otimes (\mathbf{c} \circ \mathbf{c})^\mathsf{T}\right] \mathcal{D}\left(\mathbb{N}\right) \, d\mathbb{N}.\tag{10.136}$$

Combining this with the differential of *E(N )*ˆ <sup>2</sup> gives the sensitivity of *V (N )*ˆ :

$$\frac{dV(\hat{N})}{d\boldsymbol{\theta}^{\mathsf{T}}} = 2\left\{ \left[ \mathbf{b}^{\mathsf{T}} \otimes (\mathbf{c} \circ \mathbf{c})^{\mathsf{T}} \right] \mathcal{D}\left(\mathbb{N}\right) - \bar{N} \left( \mathbf{b}^{\mathsf{T}} \otimes \mathbf{c}^{\mathsf{T}} \right) \right\} \frac{d\mathbb{N}}{d\boldsymbol{\theta}^{\mathsf{T}}} \tag{10.137}$$

where *d*N*/dθ*<sup>T</sup> is given by (10.121). The extension to higher moments, should one want to know, say, the sensitivity of the skewness of population size over a cycle, is possible.

#### *10.7.4 Periodic Dynamics in Periodic Environments*

Periodic environments (e.g., seasons within a year) are described by periodic products of matrices. If the environmental cycle contains *p* phases, then matrices **A**1*,...* **A***<sup>p</sup>* describe the dynamics at each phase, and the periodic product **A***<sup>p</sup>* ··· **A**<sup>1</sup> projects the population over an entire cycle. Nonlinear periodic models permit the **A***<sup>i</sup>* to depend on the population vector at any point in the cycle, including delayed dependence (e.g., the reproductive success of an individual plant in the fall may depend on the density it experienced in the spring). A fixed point on the inter-annual time scale is a *p*-cycle on the seasonal time scale. A *k*-cycle on the inter-annual scale corresponds to a *kp*-cycle on the seasonal time scale. The sensitivity analysis of these models is given by Caswell and Shyu (2012) and presented here in Chap. 8. For an application to the dynamics of an invasive plant population, see Shyu et al. (2013).

#### **10.8 Dynamic Environmental Feedback Models**

The commonly encountered forms of density dependence are usually a shorthand for a feedback between a population and some aspect of its environment.18 The static feedback model of Sect. 10.3 begins to incorporate environmental feedback, but assumed that the environmental variable **g***(t)* had no inherent dynamics of its own. A more general, dynamic environmental feedback model can be written

$$\mathbf{n}(t+1) = \mathbf{A}[\theta, \mathbf{n}(t), \mathbf{g}(t)]\mathbf{n}(t)$$

$$\mathbf{g}(t+1) = \mathbf{B}[\theta, \mathbf{n}(t), \mathbf{g}(t)]\mathbf{g}(t) \tag{10.138}$$

allowing for **n***(t)* to depend on both the environment and on its own density, and likewise for the environmental factor.

The sensitivity of the equilibrium of (10.138) can be found using an approach similar to that applied above to cycles. At equilibrium,

$$\hat{\mathbf{n}} = \mathbf{A}[\theta, \hat{\mathbf{n}}, \hat{\mathbf{g}}] \hat{\mathbf{n}} \tag{10.139}$$

$$
\hat{\mathbf{g}} = \mathbf{B}[\theta, \hat{\mathbf{n}}, \hat{\mathbf{g}}] \hat{\mathbf{g}} \tag{10.140}
$$

Differentiating both sides of each equation, expanding *d*vec **A** and *d*vec **B**, and applying the vec operator gives

$$d\hat{\mathbf{n}} = \mathbf{A} \left( d\hat{\mathbf{n}} \right) + \left( \hat{\mathbf{n}}^{\mathsf{T}} \otimes \mathbf{I}\_{s} \right) \left( \frac{\partial \text{vec} \, \mathbf{A}}{\partial \boldsymbol{\theta}^{\mathsf{T}}} d\boldsymbol{\theta} + \frac{\partial \mathbf{A}}{\partial \mathbf{n}^{\mathsf{T}}} d\hat{\mathbf{n}} + \frac{\partial \mathbf{A}}{\partial \mathbf{g}^{\mathsf{T}}} d\hat{\mathbf{g}} \right) \quad (10.141)$$

$$d\hat{\mathbf{g}} = \mathbf{B} \left( d\hat{\mathbf{n}} \right) + \left( \hat{\mathbf{g}}^{\mathsf{T}} \otimes \mathbf{I}\_{q} \right) \left( \frac{\partial \text{vec} \, \mathbf{B}}{\partial \boldsymbol{\theta}^{\mathsf{T}}} d\boldsymbol{\theta} + \frac{\partial \mathbf{B}}{\partial \mathbf{n}^{\mathsf{T}}} d\hat{\mathbf{n}} + \frac{\partial \mathbf{B}}{\partial \mathbf{g}^{\mathsf{T}}} d\hat{\mathbf{g}} \right) . \quad (10.142)$$

Applying the identification theorem and the chain rule gives

<sup>18</sup>Early writers even interpreted the simple logistic equation as an interplay between a biotic potential for exponential growth and an environmental resistance due to lack of resources or interaction with predators (e.g., Chapman 1931). Incorporating a fully dynamic feedback greatly expands the range of phenomena that can be explained (see de Roos and Persson (2013) for an extensive development of this approach).

$$\frac{d\hat{\mathbf{n}}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \mathbf{A}\frac{d\hat{\mathbf{n}}}{d\boldsymbol{\theta}^{\mathsf{T}}} + \left(\hat{\mathbf{n}} \otimes \mathbf{I}\_{s}\right)\frac{\partial \text{vec}\,\mathbf{A}}{\partial \boldsymbol{\theta}^{\mathsf{T}}} + \left(\hat{\mathbf{n}} \otimes \mathbf{I}\_{s}\right)\frac{\partial \text{vec}\,\mathbf{A}}{\partial \mathbf{n}^{\mathsf{T}}}\frac{d\hat{\mathbf{n}}}{d\boldsymbol{\theta}^{\mathsf{T}}}$$

$$+ \left(\hat{\mathbf{n}} \otimes \mathbf{I}\_{s}\right)\frac{\partial \text{vec}\,\mathbf{A}}{\partial \mathbf{g}^{\mathsf{T}}}\frac{d\hat{\mathbf{g}}}{d\boldsymbol{\theta}^{\mathsf{T}}}\tag{10.143}$$

with a similar expression for *<sup>d</sup>***g**ˆ*/dθ*T. All matrices and their derivatives are evaluated at the equilibrium *(***n**ˆ*,* **g**ˆ*)*. This system can be written in block matrix form by defining

$$\mathbf{H} \equiv \left(\hat{\mathbf{n}}^{\mathsf{T}} \otimes \mathbf{I}\_{\mathsf{s}}\right) \tag{10.144}$$

$$\mathbf{J} \equiv \left( \hat{\mathbf{g}}^{\mathsf{T}} \otimes \mathbf{I}\_{q} \right) \tag{10.145}$$

Then define

$$\mathbb{A} = \left(\frac{\mathbf{A} \mid \mathbf{0}}{\mathbf{0} \mid \mathbf{B}}\right) \tag{10.146}$$

$$\mathbb{H} = \left(\frac{\mathbf{0}|\mathbf{H}}{\mathbf{0}}\right) \tag{10.147}$$

$$\mathbf{C} = \left(\frac{\frac{\partial \text{vec} \, \mathbf{B}}{\partial \mathbf{n}^{\mathsf{T}}} \, \frac{\partial \text{vec} \, \mathbf{B}}{\partial \mathbf{g}^{\mathsf{T}}}}{\frac{\partial \text{vec} \, \mathbf{A}}{\partial \mathbf{n}^{\mathsf{T}}} \, \frac{\partial \text{vec} \, \mathbf{A}}{\partial \mathbf{g}^{\mathsf{T}}}}\right) \tag{10.148}$$

$$\mathbb{D} = \begin{pmatrix} \frac{\partial \text{vec} \, \mathbf{A}}{\partial \theta^{\mathsf{T}}}\\ \frac{\partial \text{vec} \, \mathbf{B}}{\partial \theta^{\mathsf{T}}} \end{pmatrix} \tag{10.149}$$

$$\mathbb{N} = \begin{pmatrix} \hat{\mathbf{n}} \\ \hat{\mathbf{g}} \end{pmatrix} \tag{10.150}$$

In terms of these matrices,

$$\frac{d\mathbb{N}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \mathsf{HID} + (\mathsf{A} + \mathsf{HIC}) \frac{d\mathbb{N}}{d\boldsymbol{\theta}^{\mathsf{T}}}.\tag{10.151}$$

Solving for *d*N*/dθ<sup>t</sup> r* gives the sensitivity of both the population and the environmental factor,

$$\frac{d\mathbb{N}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left(\mathbf{I}\_{s+q} - \mathbb{A} - \mathbb{H}\mathbb{C}\right)^{-1}\mathbb{H}\mathbb{D}.\tag{10.152}$$

#### **10.9 Stage-Structured Epidemics**

The transmission of infectious diseases is a source of nonlinearity because the rate of transmission depends on the abundance of infected and non-infected individuals. When demographic structure is added to the picture, the models can become complicated because the transmission process, the recovery process, and the consequences of infection may all vary among age classes or stages.

Klepac and Caswell (2011) developed a general framework for stage-classified epidemics, using the vec-permutation formulation (e.g., Chaps. 5 and 6). Individuals were jointly classified by stage and infection category, and nonlinearity was introduced by the disease transmission process. Klepac and Caswell (2011) calculated sensitivities and elasticities of equilibria and cycles of the stage × infection distribution and, of stage-specific prevalence, to parameters specifying demographic, infection, and recovery processes.

Coupling demography and epidemiology requires attention to time scales. Suppose that the demographic processes operate on one time scale: say, years. For some diseases, the infection/recovery process might operate on a much longer time scale (decades). Or the disease might play out on a much shorter time scale (weeks). When the disease time scale is shorter than the demographic time scale, the matrices in Klepac's model that define disease transmission operate many times within a single year; the result is a periodic model on the infection time scale. See Klepac and Caswell (2011) for details.

#### **10.10 Moments of Longevity in Nonlinear Models**

The statistics of longevity (e.g., life expectancy) are traditionally calculated from linear age-classified models (see Chap. 4) or from linear stage-classified models (see Chap. 5). In a nonlinear model at equilibrium, the projection matrix is constant and an individual experiences a fixed schedule of vital rates, from which all the usual statistics of longevity can be calculated. Write the density-dependent projection matrix as

$$\mathbf{A}[\theta,\mathbf{n}] = \mathbf{U}[\theta,\mathbf{n}] + \mathbf{F}[\theta,\mathbf{n}] \tag{10.153}$$

where **U** contains the transition probabilities for individuals already present in the population and **F** describes the production of new individuals by reproduction. The matrix **U** is the transient matrix of an absorbing Markov chain, with death as an absorbing state. The fundamental matrix of this chain at equilibrium is

$$\mathbf{N}[\theta, \hat{\mathbf{n}}] = \left(\mathbf{I}\_s - \mathbf{U}[\theta, \hat{\mathbf{n}}]\right)^{-1} \tag{10.154}$$

where the inverse is guaranteed to exist if the spectral radius of **U** is less than 1. The *(i, j )* element of **N** is the expected time spent in stage *i*, before death, by an individual in stage *j* .

As in Chap. 4, the vector *η*<sup>1</sup> containing the mean longevity of each age class or stage is given by

$$
\boldsymbol{\eta}\_1^\mathsf{T} = \mathbf{1}\_s^\mathsf{T} \mathbf{N}[\hat{\mathbf{n}}].\tag{10.155}
$$

The moments of longevity and other indices are calculated from **N** *θ,* **n**ˆ just as in the linear case. All the sensitivity results of Chaps. 4 and 5 apply directly, except that the derivative of **N** *θ,* **n**ˆ must include both the direct effects of *θ* and the indirect effects through **n**ˆ. For convenience, write **N**ˆ and **U**ˆ for the matrices at equilibrium. Then

$$d\text{vec}\,\hat{\mathbf{N}} = \left(\hat{\mathbf{N}}^{\mathsf{T}} \otimes \hat{\mathbf{N}}\right)d\text{vec}\,\hat{\mathbf{U}}\tag{10.156}$$

$$=\left(\hat{\mathbf{N}}^{\mathsf{T}}\otimes\hat{\mathbf{N}}\right)\left[\frac{d\mathbf{v}\mathbf{c}\,\hat{\mathbf{U}}}{d\boldsymbol{\theta}^{\mathsf{T}}}d\boldsymbol{\theta}+\frac{d\hat{\mathbf{U}}}{d\hat{\mathbf{n}}^{\mathsf{T}}}d\hat{\mathbf{n}}\right] \tag{10.157}$$

where **<sup>U</sup>**<sup>ˆ</sup> , **<sup>N</sup>**<sup>ˆ</sup> , and the derivatives of **<sup>U</sup>** are all evaluated at equilibrium and *<sup>d</sup>***n**ˆ*/dθ*<sup>T</sup> is given by (10.16). Comparing this with equation (4.34) shows that the nonlinearity adds an extra term, capturing the way that changes in *θ* affect the vital rates through changes in equilibrium density.

This approach can be used to generalize the results for higher moments of longevity (Chaps. 4, 5, and 11) to the nonlinear case.

#### **10.11 Summary**

Table 10.3 lists the perturbation analysis results in this chapter; they comprise a fairly complete analysis for nonlinear demographic models. The nonlinearities may arise from density dependence, frequency dependence, environmental feedback, proportional population structure calculations, or recruitment subsidy. The sensitivity calculations accommodate a wide range of dependent variables and the calculation of both sensitivity and elasticity with respect to any kind of demographic parameters.

As in other chapters, most of the results in this chapter follow a straightforward method:

1. Write the model, specifying the dependence of the vital rates on *θ* and **n**.


**Table 10.3** Summary of models and main sensitivity results of the chapter. Extending sensitivities to additional dependent variables (ratios, averages, rates,


#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Part V Markov Chains**

# **Chapter 11 Sensitivity Analysis of Discrete Markov Chains**

#### **11.1 Introduction**

As we have seen repeatedly, Markov chains are often used as mathematical models of demographic (as well as other natural) phenomena, with transition probabilities defined in terms of parameters that are of interest in the scientific question at hand. Sensitivity analysis is an important way to quantify the effects of changes in these parameters on the behavior of the chain. This chapter revisits, in a more rigorous way, some of the quantities already explored for absorbing Markov chains (Chaps. 4, 5, and 6). It will also consider ergodic Markov chains (in which no absorbing states exist), and calculate the sensitivity of the stationary distribution and measures of the rate of convergence.

Perturbation (or sensitivity) analysis is a long-standing problem in the theory of Markov chains (Schweitzer 1968; Conlisk 1985; Golub and Meyer 1986; Funderlic and Meyer 1986; Seneta 1988, 1993; Meyer 1994; Cho and Meyer 2000; Mitrophanov 2003, 2005; Mitrophanov et al. 2005; Kirkland et al. 2008). When Markov chains are applied as models of physical, biological, or social systems, they are often defined as functions of parameters that have substantive meaning.

Chapter 11 is modified, under the terms of a Journal Publishing Agreement with Elsevier Publishers, from: Caswell, H. Sensitivity analysis of discrete Markov chains via matrix calculus. Linear Algebra and its Applications 438:1727–1745. ©Elsevier.

#### **11.2 Absorbing Chains**

The transition matrix for a discrete-time absorbing chain can be written

$$\mathbf{P} = \left(\frac{\mathbf{U} \mid \mathbf{0}}{\mathbf{M} \mid \mathbf{I}}\right) \tag{11.1}$$

where **U**, of dimension *s* × *s*, is the transition matrix among the *s* transient states, and **M**, of dimension *a* × *s*, contains probabilities of transition from the transient states to the *a* absorbing states. Assume that the spectral radius of **U** is strictly less than 1. Because we are concerned here with absorption, but not what happens after, we ignore transitions among absorbing states; hence the identity matrix (*a* × *a*) in the lower right corner. The matrices **U**[*θ*] and **M**[*θ*] are functions of a vector of parameters. We assume that *θ* varies over some set in which the column sums of **P** are 1 and the spectral radius of **U** is strictly less than one.

#### *11.2.1 Occupancy: Visits to Transient States*

Let *νij* be the number of visits to transient state *i*, prior to absorption, by an individual starting in transient state *j* . The expectations of the *νij* are entries of the fundamental matrix **N** = **N**<sup>1</sup> = *E(ηij )* :

$$\mathbf{N} = (\mathbf{I} - \mathbf{U})^{-1} \tag{11.2}$$

(e.g., Kemeny and Snell 1960; Iosifescu 1980). Let **N***<sup>k</sup>* = *E(η<sup>k</sup> ij )* be a matrix containing the *k*th moments about the origin of the *νij* . The first several of these matrices are (Iosifescu 1980, Thm. 3.1)

$$\mathbf{N}\_{\rm l} = \left(\mathbf{I} - \mathbf{U}\right)^{-\rm l} \tag{11.3}$$

$$\mathbf{N}\_2 = \left(2\mathbf{N}\_{\rm dg} - \mathbf{I}\right)\mathbf{N}\_1\tag{11.4}$$

$$\mathbf{N}\_3 = \left(6\mathbf{N}\_{\rm dg}^2 - 6\mathbf{N}\_{\rm dg} + \mathbf{I}\right)\mathbf{N}\_{\rm l} \tag{11.5}$$

$$\mathbf{N}\_4 = \left(24\mathbf{N}\_{\rm dg}^3 - 36\mathbf{N}\_{\rm dg}^2 + 14\mathbf{N}\_{\rm dg} - \mathbf{I}\right)\mathbf{N}\_1. \tag{11.6}$$

**Theorem 11.2.1** *Let* **N***<sup>k</sup> be the matrix of kth moments of the νij , as given by* (11.3)*,* (11.4)*,* (11.5)*, and* (11.6)*. The sensitivities of* **N***k, for k* = 1*,...,* 4 *are*

$$d\mathbf{vec}\,\mathbf{N}\_{\mathrm{l}} = \left(\mathbf{N}\_{\mathrm{l}}^{\mathsf{T}} \otimes \mathbf{N}\_{\mathrm{l}}\right) d\mathbf{vec}\,\mathbf{U} \tag{11.7}$$

$$d\text{vec}\,\mathbf{N}\_2 = \left[2\left(\mathbf{I}\otimes\mathbf{N}\_{\text{dg}}\right) - \mathbf{I}\_{s^2}\right]d\text{vec}\,\mathbf{N}\_l + 2\left(\mathbf{N}^\top\otimes\mathbf{I}\right)d\text{vec}\,\mathbf{N}\_{\text{dg}}\tag{11.8}$$

$$\begin{aligned} d\operatorname{vec}\mathbf{N}\_{3} &= \left[\mathbf{I} \otimes \left(6\mathbf{N}\_{\mathrm{dg}}^{2} - 6\mathbf{N}\_{\mathrm{dg}} + \mathbf{I}\right)\right] d\operatorname{vec}\mathbf{N}\_{1} \\ &+ \left[6\left(\mathbf{N}^{\mathrm{T}}\mathbf{N}\_{\mathrm{dg}} \otimes \mathbf{I}\right) + 6\left(\mathbf{N}^{\mathrm{T}} \otimes \mathbf{N}\_{\mathrm{dg}}\right) - 6\left(\mathbf{N}^{\mathrm{T}} \otimes \mathbf{I}\right)\right] d\operatorname{vec}\mathbf{N}\_{\mathrm{dg}} \\\\ d\operatorname{vec}\mathbf{N}\_{4} &= \left[\mathbf{I} \otimes \left(24\mathbf{N}\_{\mathrm{dg}}^{3} - 36\mathbf{N}\_{\mathrm{dg}}^{2} + 14\mathbf{N}\_{\mathrm{dg}} - \mathbf{I}\right)\right] d\operatorname{vec}\mathbf{N}\_{1} \\ &+ \left[24\left(\mathbf{N}^{\mathrm{T}}\mathbf{N}\_{\mathrm{dg}}^{2} \otimes \mathbf{I}\right) + 24\left(\mathbf{N}^{\mathrm{T}}\mathbf{N}\_{\mathrm{dg}} \otimes \mathbf{N}\_{\mathrm{dg}}\right) + 24\left(\mathbf{N}^{\mathrm{T}} \otimes \mathbf{N}\_{\mathrm{dg}}^{2}\right)\right] \\\\ &- 36\left(\mathbf{N}^{\mathrm{T}}\mathbf{N}\_{\mathrm{dg}} \otimes \mathbf{I}\right) - 36\left(\mathbf{N}^{\mathrm{T}} \otimes \mathbf{N}\_{\mathrm{dg}}\right) + 14\left(\mathbf{N}^{\mathrm{T}} \otimes \mathbf{I}\right)\right] d\operatorname{vec}\mathbf{N}\_{\mathrm{dg}} \quad (11.10) \end{aligned}$$

*where (see Sect.* 2.8*)*

$$d\mathbf{N}\_{\rm dg} = \mathbf{I} \diamond d\mathbf{N}\_{\rm l} \tag{11.11}$$

$$d\text{vec}\,\mathbf{N}\_{\text{dg}} = \mathcal{D}\,(\text{vec}\,\mathbf{I})d\text{vec}\,\mathbf{N}\_{\text{l}}.\tag{11.12}$$

*Proof* The result (11.7) is derived in Caswell (2006, Section 3.1). For *k >* 1, and considering **N***<sup>k</sup>* as a function of **N**<sup>1</sup> and **N**dg, the total differential of **N***<sup>k</sup>* is

$$d\text{vec}\,\mathbf{N}\_k = \frac{\partial \text{vec}\,\mathbf{N}\_k}{\partial \text{vec}\,\mathbf{7}\,\mathbf{N}\_l} d\text{vec}\,\mathbf{N}\_l + \frac{\partial \text{vec}\,\mathbf{N}\_k}{\partial \text{vec}\,\mathbf{7}\,\mathbf{N}\_{\text{dg}}} d\text{vec}\,\mathbf{N}\_{\text{dg}}.\tag{11.13}$$

The two terms of (11.13) are the partial differentials of vec **N***k*, obtained by taking differentials treating only **N**<sup>1</sup> or only **N**dg as variables, respectively. Denote these partial differentials as *∂***<sup>N</sup>**<sup>1</sup> *∂***<sup>N</sup>**<sup>1</sup> and *∂***<sup>N</sup>**dg and *∂***<sup>N</sup>**dg . Differentiating **N**<sup>2</sup> in (11.4), gives

$$d\partial\_{\mathbf{N}\_{\mathrm{l}}} \mathbf{N}\_{2} = 2\mathbf{N}\_{\mathrm{dg}} \left(d\mathbf{N}\_{\mathrm{l}}\right) - d\mathbf{N}\_{\mathrm{l}} \tag{11.14}$$

$$
\partial\_{\mathbf{N\_{dg}}} \mathbf{N\_{2}} = 2 \left( d \mathbf{N\_{dg}} \right) \mathbf{N\_{l}}.\tag{11.15}
$$

Applying the vec operator gives

$$\partial\_{\mathbf{N}\_{\mathrm{l}}} \mathrm{vec} \, \mathbf{N}\_{2} = \left[ 2 \left( \mathbf{I} \otimes \mathbf{N}\_{\mathrm{dg}} \right) - \mathbf{I}\_{s^{2}} \right] d \mathrm{vec} \, \mathbf{N}\_{\mathrm{l}} \tag{11.16}$$

$$d\theta\_{\rm{N\_dg}} \text{vec}\,\mathbf{N\_2} = 2\left(\mathbf{N\_l^T} \otimes \mathbf{I}\right)d\mathbf{vec}\,\mathbf{N\_{dg}},\tag{11.17}$$

and (11.13) becomes

$$d\operatorname{vec}\mathbf{N}\_2 = \left[2\left(\mathbf{I}\otimes\mathbf{N}\_{\mathrm{dg}}\right) - \mathbf{I}\_{\mathrm{g}^2}\right]d\operatorname{vec}\mathbf{N}\_1 + 2\left(\mathbf{N}\_1^\mathsf{T}\otimes\mathbf{I}\right)d\operatorname{vec}\mathbf{N}\_{\mathrm{dg}},\tag{11.18}$$

which is (11.8). The derivations of *d*vec **N**<sup>3</sup> and *d*vec **N**<sup>4</sup> follow the same sequence of steps. The details are given in Appendix A.

The derivatives of **N**2, **N**3, and **N**<sup>4</sup> can be used to study the variance, standard deviation, coefficient of variation, skewness, and kurtosis of the number of visits to the transient states (Caswell 2006, 2009, 2011).

#### *11.2.2 Time to Absorption*

Let *ηj* be the time to absorption starting in transient state *j* and let *η<sup>k</sup>* = *E ηk* <sup>1</sup>*,* ··· *, η<sup>k</sup> s* T . The first several of these moments are (Iosifescu 1980, Thm. 3.2)

$$\boldsymbol{\eta}\_{1}^{\mathsf{T}} = \mathbf{1}^{\mathsf{T}} \mathbf{N}\_{1} \tag{11.19}$$

$$
\boldsymbol{\eta}\_2^\mathsf{T} = \boldsymbol{\eta}\_1^\mathsf{T} \left( 2\mathbf{N}\_1 - \mathbf{I} \right) \tag{11.20}
$$

$$
\boldsymbol{\eta}\_{3}^{\mathsf{T}} = \boldsymbol{\eta}\_{1}^{\mathsf{T}} \left( 6 \mathbf{N}\_{1}^{2} - 6 \mathbf{N}\_{1} + \mathbf{I} \right) \tag{11.21}
$$

$$
\boldsymbol{\eta}\_4^\mathsf{T} = \boldsymbol{\eta}\_1^\mathsf{T} \left( 24\mathbf{N}\_1^3 - 36\mathbf{N}\_1^2 + 14\mathbf{N}\_1 - \mathbf{I} \right). \tag{11.22}
$$

**Theorem 11.2.2** *Let η<sup>k</sup> be the vector of the kth moments of the ηi. The sensitivities of these moment vectors are*

$$d\eta\_1 = \left(\mathbf{I} \otimes \mathbf{1}^{\mathsf{T}}\right) d\mathsf{vec} \,\mathbf{N}\_{\mathsf{l}} \tag{11.23}$$

$$d\eta\_2 = \left(2\mathbf{N}\_1^\mathsf{T} - \mathbf{I}\right)d\eta\_1 + 2\left(\mathbf{I}\otimes\boldsymbol{\eta}\_1^\mathsf{T}\right)d\mathbf{vec}\,\mathbf{N}\_1\tag{11.24}$$

$$\dots$$

$$\begin{aligned} d\boldsymbol{\eta}\_{3} &= \left(6\mathbf{n}^{2} - 6\mathbf{N}\_{1} + \mathbf{I}\right)^{\mathsf{T}} d\boldsymbol{\eta}\_{1} \\ &+ \left[6\left(\mathbf{N}\_{1}^{\mathsf{T}} \otimes \boldsymbol{\eta}\_{1}^{\mathsf{T}}\right) + 6\left(\mathbf{I} \otimes \boldsymbol{\eta}\_{1}^{\mathsf{T}}\mathbf{N}\_{1}\right) - 6\left(\mathbf{I} \otimes \boldsymbol{\eta}\_{1}^{\mathsf{T}}\right)\right] d\boldsymbol{\mathrm{vec}} \,\mathbf{N}\_{1} \end{aligned} \tag{11.25}$$

$$\begin{aligned} d\boldsymbol{\eta}\_{4} &= \left(24\mathbf{N}\_{1}^{3} - 36\mathbf{N}\_{1}^{2} + 14\mathbf{N}\_{1} - \mathbf{I}\right)^{\mathsf{T}} d\boldsymbol{\eta}\_{1} \\ &+ \left\{24\left[\left(\mathbf{N}\_{1}^{\mathsf{T}}\right)^{2} \otimes \boldsymbol{\eta}\_{1}^{\mathsf{T}}\right] + 24\left(\mathbf{N}\_{1}^{\mathsf{T}} \otimes \boldsymbol{\eta}\_{1}^{\mathsf{T}}\mathbf{N}\_{1}\right) + 24\left(\mathbf{I} \otimes \boldsymbol{\eta}\_{1}^{\mathsf{T}}\mathbf{N}\_{1}^{2}\right) \\ &- 36\left(\mathbf{N}\_{1}^{\mathsf{T}} \otimes \boldsymbol{\eta}\_{1}^{\mathsf{T}}\right) - 36\left(\mathbf{I} \otimes \boldsymbol{\eta}\_{1}^{\mathsf{T}}\mathbf{N}\_{1}\right) + 14\left(\mathbf{I} \otimes \boldsymbol{\eta}\_{1}^{\mathsf{T}}\right)\right\} d\boldsymbol{\mathrm{vec}} \,\mathbf{N}\_{1} \,(11.26) \end{aligned}$$

*where d*vec **N**<sup>1</sup> *is given by* (11.7)*.*

*Proof* The derivative of *η*<sup>1</sup> is obtained (Caswell 2006) by differentiating to get *dη*<sup>T</sup> <sup>1</sup> <sup>=</sup> **<sup>1</sup>**<sup>T</sup> *(d***N**1*)* and then applying the vec operator. For the higher moments, consider the *η<sup>k</sup>* to be functions of *η*<sup>1</sup> and **N**1, and write the total differential

$$d\eta\_k = \frac{\partial \eta\_k}{\partial \eta\_1^{\mathsf{T}}} \, d\eta\_1 + \frac{\partial \eta\_k}{\partial \text{vec}^{\mathsf{T}} \mathbf{N}\_{\mathsf{I}}} \, d\text{vec} \, \mathbf{N}\_{\mathsf{I}}.\tag{11.27}$$

The partial differentials of *η*<sup>2</sup> with respect to *η*<sup>1</sup> and **N**<sup>1</sup> are

$$
\partial\_{\boldsymbol{\eta}\_{1}} \boldsymbol{\eta}\_{2}^{\mathsf{T}} = \left(d\boldsymbol{\eta}\_{1}^{\mathsf{T}}\right) \left(2\mathbf{N}\_{\mathsf{l}} - \mathbf{I}\right) \tag{11.28}
$$

$$
\partial\_{\mathbf{N}\_{\parallel}} \boldsymbol{\eta}\_{2}^{\mathsf{T}} = 2 \boldsymbol{\eta}\_{\parallel}^{\mathsf{T}} \left( d \mathbf{N}\_{\parallel} \right) . \tag{11.29}
$$

Applying the vec operator gives

$$
\partial\_{\boldsymbol{\eta}\_{1}} \boldsymbol{\eta}\_{2} = \left(2\mathbf{N}\_{1}^{\mathsf{T}} - \mathbf{I}\right) d\boldsymbol{\eta}\_{1} \tag{11.30}
$$

$$\partial\_{\mathbf{N}\_{\parallel}} \boldsymbol{\eta}\_2 = 2 \left( \mathbf{I} \otimes \boldsymbol{\eta}\_{\parallel}^{\mathsf{T}} \right) d \mathbf{vec} \, \mathbf{N}\_{\parallel} \tag{11.31}$$

which combine according to (11.27) to yield (11.24). The derivations of *dη*<sup>3</sup> and *dη*<sup>4</sup> follow the same sequence of steps; the details are shown in Appendix A.

#### *11.2.3 Number of States Visited Before Absorption*

Let *ξi* ≥ 1 be the number of distinct transient states visited before absorption, and let *ξ* <sup>1</sup> = *E(ξ )*. Then

$$\boldsymbol{\xi}\_{\rm l}^{\rm T} = \mathbf{1}^{\rm T} \mathbf{N}\_{\rm dg}^{-1} \mathbf{N}\_{\rm l} \tag{11.32}$$

(Iosifescu 1980, Sect. 3.2.5), where **N**−<sup>1</sup> dg = **<sup>N</sup>**dg−<sup>1</sup> .

**Theorem 11.2.3** *Let ξ* <sup>1</sup> = *E(ξ ). The sensitivity of ξ is*

$$d\boldsymbol{\xi}\_{1} = \left[ -\left( \mathbf{N}\_{1}^{\mathsf{T}} \otimes \mathbf{1}^{\mathsf{T}} \right) \left( \mathbf{N}\_{\mathsf{dg}}^{-1} \otimes \mathbf{N}\_{\mathsf{dg}}^{-1} \right) \mathcal{D} \left( \mathrm{vec} \, \mathbf{I} \right) + \left( \mathbf{I} \otimes \mathbf{1}^{\mathsf{T}} \mathbf{N}\_{\mathsf{dg}}^{-1} \right) \right] d\boldsymbol{\mathrm{vec}} \, \mathbf{N}\_{\mathsf{l}},\tag{11.33}$$

*where d*vec **N**<sup>1</sup> *is given by* (11.7)*.*

*Proof* Differentiating (11.32) yields

$$d\boldsymbol{\xi}\_{\rm l}^{\rm T} = \mathbf{1}^{\rm T} \left(d\mathbf{N}\_{\rm dg}^{-1}\right) \mathbf{N}\_{\rm l} + \mathbf{1}^{\rm T} \mathbf{N}\_{\rm dg}^{-1} d\mathbf{N}\_{\rm l}.\tag{11.34}$$

Applying the vec operator yields

$$d\boldsymbol{\xi}\_{1} = \left(\mathbf{N}\_{\mathrm{l}}^{\mathsf{T}} \otimes \mathbf{1}^{\mathsf{T}}\right) d\mathrm{vec}\,\mathbf{N}\_{\mathrm{dg}}^{-1} + \left(\mathbf{I} \otimes \mathbf{1}^{\mathsf{T}}\mathbf{N}\_{\mathrm{dg}}^{-1}\right) d\mathrm{vec}\,\mathbf{N}\_{\mathrm{l}}.\tag{11.35}$$

Applying (2.82) to *d*vec **N**−<sup>1</sup> dg and using (11.12) for *d*vec **N**dg gives

$$d\boldsymbol{\xi}\_{1} = -\left(\mathbf{N}\_{1}^{\mathsf{T}} \otimes \mathbf{1}^{\mathsf{T}}\right) \left(\mathbf{N}\_{\mathrm{dg}}^{-1} \otimes \mathbf{N}\_{\mathrm{dg}}^{-1}\right) \mathcal{D}\left(\mathrm{vec}\,\mathbf{I}\right) d\mathrm{vec}\,\mathbf{N}\_{\mathrm{l}} + \left(\mathbf{I} \otimes \mathbf{1}^{\mathsf{T}}\mathbf{N}\_{\mathrm{dg}}^{-1}\right) d\mathrm{vec}\,\mathbf{N}\_{\mathrm{l}}\tag{11.36}$$

which simplifies to (11.33).

#### *11.2.4 Multiple Absorbing States and Probabilities of Absorption*

When the chain includes *a >* 1 absorbing states, the entry *mij* of the *a*×*s* submatrix **M** in (11.1) is the probability of transition from transient state *j* to absorbing state *i*. The result of the competing risks of absorption is a set of probabilities *bij* = *P* absorption in *i* |starting in *j* for *i* = 1*,...,a* and *j* = 1*,...,s*. The matrix **B** = *bij* = **MN**<sup>1</sup> (Iosifescu 1980, Thm. 3.3).

**Theorem 11.2.4** *Let* **B** = **MN**<sup>1</sup> *be the matrix of absorption probabilities. Then*

$$d\operatorname{vec}\mathbf{B} = \left(\mathbf{N}\_{\mathrm{l}}^{\mathsf{T}} \otimes \mathbf{I}\right)d\operatorname{vec}\mathbf{M} + \left(\mathbf{N}\_{\mathrm{l}}^{\mathsf{T}} \otimes \mathbf{B}\right)d\operatorname{vec}\mathbf{U}.\tag{11.37}$$

*Proof* Differentiating **B** yields

$$d\mathbf{B} = \left(d\mathbf{M}\right)\mathbf{N}\_{\parallel} + \mathbf{M}\left(d\mathbf{N}\_{\parallel}\right). \tag{11.38}$$

Applying the vec operator gives

$$d\mathbf{vec}\,\mathbf{B} = \left(\mathbf{N}\_{\parallel}^{\mathsf{T}} \otimes \mathbf{I}\right) d\mathbf{vec}\,\mathbf{M} + \left(\mathbf{I} \otimes \mathbf{M}\right) d\mathbf{vec}\,\mathbf{N}\_{\parallel}.\tag{11.39}$$

Substituting (11.7) for *d*vec **N**<sup>1</sup> and simplifying gives (11.37).

Column *j* of **B** is the probability distribution of the eventual absorption state for an individual starting in transient state *j* . Usually a few of those starting states are of particular interest (e.g., states corresponding to "birth" or to the start of some process). Let **B***(*:*,j)* = **Be***<sup>j</sup>* denote column *j* of **B**, where **e***<sup>j</sup>* is the *j* th unit vector of length *s*. Thus the derivative of **B***(*:*,j)* is

$$d\text{vec}\,\mathbf{B}(:,j) = \left(\mathbf{e}\_j^{\mathsf{T}} \otimes \mathbf{I}\_s\right) d\text{vec}\,\mathbf{B} \tag{11.40}$$

where *<sup>d</sup>*vec **<sup>B</sup>** is given by (11.37). Similarly, row *<sup>i</sup>* of **<sup>B</sup>** is **<sup>B</sup>***(i,* :*)* <sup>=</sup> **<sup>e</sup>**<sup>T</sup> *<sup>i</sup>* **B** and

$$d\text{vec}\,\mathbf{B}(i,:) = \left(\mathbf{I}\_s \otimes \mathbf{e}\_l^\mathsf{T}\right) d\text{vec}\,\mathbf{B} \tag{11.41}$$

where **e***<sup>i</sup>* is the *i*th unit vector of length *a*.

#### *11.2.5 The Quasistationary Distribution*

The quasistationary distribution of an absorbing Markov chain gives the limiting probability distribution, over the set of transient states, of the state of an individual that has yet to be absorbed. Let **w** and **v** be the right and left eigenvectors associated with the dominant eigenvalue of **U**, normalized so that **w**=**v** = 1. Darroch and Seneta (1965) defined two quasistationary distributions in terms of **w** and **v**. The limiting probability distribution of the state of an individual, given that absorption has not yet happened, converges to

$$\mathbf{q}\_a = \mathbf{w} \tag{11.42}$$

The limiting probability distribution of the state of an individual, given that absorption has not happened and will not happen for a long time, is

$$\mathbf{q}\_b = \frac{\mathbf{w} \circ \mathbf{v}}{\mathbf{w}^{\mathsf{T}} \mathbf{v}} \tag{11.43}$$

Horvitz and Tuljapurkar (2008) pointed out that the convergence to the quasistationary distribution implies that, in a stage-classified model, mortality eventually becomes independent of age.

**Lemma 1** *Let the dominant eigenvalue of* **U***, guaranteed real and nonnegative by the Perron-Frobenius theorem, satisfy* 0 *<λ<* 1*, and let* **w** *and* **v** *be the right and left eigenvectors corresponding to <sup>λ</sup>, scaled so that* **<sup>w</sup>**T**<sup>v</sup>** <sup>=</sup> <sup>1</sup>*. Then*

$$d\mathbf{w} = \left(\lambda \mathbf{I}\_s - \mathbf{U} + \mathbf{w} \mathbf{1}^\mathsf{T} \mathbf{U}\right)^{-1} \left[\mathbf{w}^\mathsf{T} \otimes \left(\mathbf{I}\_s - \mathbf{w} \mathbf{1}^\mathsf{T}\right)\right] d\mathbf{vec} \,\mathbf{U} \qquad (11.44)$$

$$d\mathbf{v} = \left(\lambda \mathbf{I}\_s - \mathbf{U}^\mathsf{T} + \mathbf{v} \mathbf{e}\_1^\mathsf{T} \mathbf{U}^\mathsf{T}\right)^{-1} \left[ \left(\mathbf{I}\_s - \mathbf{v} \mathbf{e}\_1^\mathsf{T}\right) \otimes \mathbf{v}^\mathsf{T} \right] d\mathbf{vec} \,\mathbf{U} \qquad (11.45)$$

*Proof* Equation (11.44) is proven in Caswell (2008, Section 6.1). Equation (11.45) is obtained by treating **<sup>v</sup>** as the right eigenvector of **<sup>U</sup>**T.

**Theorem 11.2.5** *The derivative of the quasistationary distribution* **q***<sup>a</sup> is given by* (11.44)*. The derivative of the quasistationary distribution* **q***<sup>b</sup> is*

$$d\mathbf{q}\_b = \frac{1}{\mathbf{v}^{\mathsf{T}}\mathbf{w}} \left[ \left( \mathcal{D}\left(\mathbf{v}\right) - \mathbf{q}\_b \mathbf{v}^{\mathsf{T}} \right) d\mathbf{w} + \left( \mathcal{D}\left(\mathbf{w}\right) - \mathbf{q}\_b \mathbf{w}^{\mathsf{T}} \right) d\mathbf{v} \right] \tag{11.46}$$

*where d***w** *and d***v** *are given by* (11.44) *and* (11.45) *respectively.*

*Proof* The derivative of **q***<sup>a</sup>* follows from its definition as the scaled right eigenvector of **U**. For **q***b*, differentiating (11.43) gives

$$d\mathbf{q}\_b = \frac{1}{\left(\mathbf{v}^\mathsf{T}\mathbf{w}\right)^2} \left\{ \left(\mathbf{v}^\mathsf{T}\mathbf{w}\right)d\left(\mathbf{v}\circ\mathbf{w}\right) - \left(\mathbf{v}\circ\mathbf{w}\right)\left[\left(d\mathbf{v}^\mathsf{T}\right)\mathbf{w} + \mathbf{v}^\mathsf{T}\left(d\mathbf{w}\right)\right] \right\} \tag{11.47}$$

$$\mathbf{v} = \frac{1}{\mathbf{v}^{\mathsf{T}}\mathbf{w}} \left[ d \left( \mathbf{v} \diamond \mathbf{w} \right) - \mathbf{q}\_{b} \left( d \mathbf{v}^{\mathsf{T}} \right) \mathbf{w} - \mathbf{q}\_{b} \mathbf{v}^{\mathsf{T}} \left( d \mathbf{w} \right) \right] \tag{11.48}$$

Applying the vec operator gives

$$d\mathbf{q}\_b = \frac{1}{\mathbf{v}^{\mathsf{T}}\mathbf{w}} \left[ \mathcal{D}\left(\mathbf{v}\right)d\mathbf{w} + \mathcal{D}\left(\mathbf{w}\right)d\mathbf{v} - \left(\mathbf{w}^{\mathsf{T}}\otimes\mathbf{q}\_b\right)d\mathbf{v} - \mathbf{q}\_b\mathbf{v}^{\mathsf{T}}d\mathbf{w} \right] \tag{11.49}$$

which simplifies to give (11.46).

#### **11.3 Life Lost Due to Mortality**

The approach here makes it easy to compute the sensitivity of a variety of dependent variables calculated from the Markov chain. As an example of this flexibility, consider a recently developed demographic index, the number of years of life lost due to mortality (Vaupel and Canudas Romo 2003).

The transient states of the chains are age classes, absorption corresponds to death, and absorbing states correspond to age at death. Let *μi* be the mortality rate and *pi* = exp*(*−*μi)* the survival probability at age *i*. The matrix **U** has the *pi* on the subdiagonal and zeros elsewhere. The matrix **M** has 1 − *pi* on the diagonal and zeros elsewhere. Let **f** = **B***(*:*,* 1*)* be the distribution of age at death and *η*<sup>1</sup> the vector of expected longevity as a function of age.

A death at age *i* represents the loss of some number of years of life beyond that age. The expectation of that loss is given by the *i*th entry of *η*1, and the expected number of years lost over the distribution of age at death is *<sup>η</sup>*† <sup>=</sup> *<sup>η</sup>*<sup>T</sup> <sup>1</sup> **f**. This quantity also measures the disparity among individuals in longevity (Vaupel and Canudas Romo 2003). If everyone died at the identical age *x*, **f** would be a delta function at *x*

$$\mathbb{D}$$

and further life expectancy at age *<sup>x</sup>* would be zero; their product would give *<sup>η</sup>*† <sup>=</sup> 0. Declines in discrepancy have accompanied increases in life expectancy observed in developed countries (Edwards and Tuljapurkar 2005; Wilmoth and Horiuchi 1999). Thus it is useful to know how *η*† responds to changes in mortality.

Differentiating *η*† gives

$$d\eta^\dagger = \left(d\eta\_1^\mathsf{T}\right)\mathbf{B}\mathbf{e}\_\mathsf{l} + \eta\_1^\mathsf{T}\left(d\mathbf{B}\right)\mathbf{e}\_\mathsf{l}.\tag{11.50}$$

Applying the vec operator gives

$$d\eta^\dagger = \mathbf{e}\_1^\mathsf{T} \mathbf{b}^\mathsf{T} d\eta\_1^\mathsf{T} + \left(\mathbf{e}\_1^\mathsf{T} \otimes \eta\_1^\mathsf{T}\right) d\text{vec} \, \mathbf{B}.\tag{11.51}$$

Substituting (11.23) for *dη*<sup>1</sup> and (11.37) for *d*vec **B** gives

$$\begin{aligned} d\boldsymbol{\eta}^{\dagger} &= \mathbf{f}^{\mathsf{T}} \left( \mathbf{I} \otimes \mathbf{1}^{\mathsf{T}} \right) d\mathrm{vec} \,\mathbf{N}\_{\mathsf{I}} + \left( \mathbf{e}\_{\mathsf{I}}^{\mathsf{T}} \otimes \boldsymbol{\eta}\_{\mathsf{I}}^{\mathsf{T}} \right) \\\\ \left[ \left( \mathbf{N}\_{\mathsf{I}}^{\mathsf{T}} \otimes \mathbf{I} \right) d\mathrm{vec} \,\mathbf{M} + \left( \mathbf{N}\_{\mathsf{I}}^{\mathsf{T}} \otimes \mathbf{B} \right) d\mathrm{vec} \,\mathbf{U} \right] \end{aligned} \tag{11.52}$$

Simplifying and writing derivatives in terms of *μ* gives

$$\frac{d\boldsymbol{\eta}^{\top}}{d\boldsymbol{\mu}^{\top}} = \left[\mathbf{f}^{\top}\left(\mathbf{N}\_{\mathrm{l}}^{\top}\otimes\boldsymbol{\eta}\_{\mathrm{l}}^{\top}\right) + \left(\mathbf{e}\_{\mathrm{l}}^{\top}\mathbf{N}\_{\mathrm{l}}^{\top}\otimes\boldsymbol{\eta}\_{\mathrm{l}}^{\top}\mathbf{B}\right)\right]$$

$$\frac{d\mathbf{vec}\,\mathbf{U}}{d\boldsymbol{\mu}^{\top}} + \left(\mathbf{e}\_{\mathrm{l}}^{\top}\mathbf{N}\_{\mathrm{l}}^{\top}\otimes\boldsymbol{\eta}\_{\mathrm{l}}^{\top}\right)\frac{d\mathbf{vec}\,\mathbf{M}}{d\boldsymbol{\mu}^{\top}}\tag{11.53}$$

Because mortality rates vary over several orders of magnitude with age, it is useful to present the results as elasticities,

$$\frac{\epsilon\eta^{\dagger}}{\epsilon\,\mu^{\dagger}} = \frac{1}{\eta^{\dagger}}\,\frac{d\eta^{\dagger}}{d\mu^{\dagger}}\,\mathcal{D}\,(\mu).\tag{11.54}$$

Figure 11.1 shows these elasticities for two populations chosen to have very different life expectancies: India in 1961, with female life expectancy of 45 years and *<sup>η</sup>*† <sup>=</sup> <sup>23</sup>*.*9 years and Japan in 2006, with female life expectancy of 86 years and *<sup>η</sup>*† <sup>=</sup> <sup>10</sup>*.*1 years (Human Mortality Database 2016). In both cases, elasticities are positive from birth to some age (≈50 for India, ≈85 for Japan) and negative thereafter. This implies that reductions in infant and early life mortality would reduce *η*†, whereas reductions in old age mortality would increase *η*†. Zhang and Vaupel (2009) have shown that the existence of such a critical age is a general property of these models.

**Fig. 11.1** The elasticity of mean years of life lost due to mortality, *η*†, to changes in age-specific mortality, calculated from the female life tables of India in 1961 and of Japan in 2006. (Data obtained from the Human Mortality Database 2016)

#### **11.4 Ergodic Chains**

Now let us consider perturbations of an ergodic finite-state Markov chain with an irreducible, primitive, column-stochastic transition matrix **P** of dimension *s* × *s*. The stationary distribution *π* is given by the right eigenvector, scaled to sum to 1, corresponding to the dominant eigenvalue *λ*<sup>1</sup> = 1 of **P**. The fundamental matrix of the chain is **Z** = **<sup>I</sup>** <sup>−</sup> **<sup>P</sup>** <sup>+</sup> *<sup>π</sup>***1**T−<sup>1</sup> (Kemeny and Snell 1960).

We are interested only in perturbations that preserve the column-stochasticity of **P**; i.e., for which **P** remains a stochastic matrix. Such perturbations are easily defined when the *pij* depend explicitly on a parameter vector *θ*. However, when the parameters of interest are the *pij* themselves, an implicit parameterization must be defined to preserve the stochastic nature of **P** under perturbation (Conlisk 1985; Caswell 2001). In Sect. 11.4.5 we will explore new expressions for two different forms of implicit parameterization.

Previous studies of perturbations of ergodic chains focus almost completely on perturbations of the stationary distribution, and are divided between those focusing on sensitivity as a derivative (e.g., Schweitzer 1968; Conlisk 1985; Golub and Meyer 1986) and studies focusing on perturbation bounds and condition numbers (Funderlic and Meyer 1986; Meyer 1994; Seneta 1988; Hunter 2005; Kirkland 2003); for reviews see Cho and Meyer (2000) and Kirkland et al. (2008). The approach here is similar in spirit to that of Schweitzer (1968), Conlisk (1985), and Golub and Meyer (1986), in that we focus on derivatives of Markov chain properties with respect to parameter perturbations, but taking advantage of the matrix calculus approach. We do not consider perturbation bounds here.

#### *11.4.1 The Stationary Distribution*

**Theorem 11.4.1** *Let π be the stationary distribution, satisfying* **P***π* = *π and* **<sup>1</sup>**T*<sup>π</sup>* <sup>=</sup> <sup>1</sup>*. The sensitivity of <sup>π</sup> is*

$$d\pi = \left[\pi^{\mathsf{T}} \otimes \left(\mathbf{Z} - \pi \mathbf{1}^{\mathsf{T}}\right)\right] d\mathsf{vec} \,\mathbf{P} \tag{11.55}$$

*where* **Z** *is the fundamental matrix of the chain.*

*Proof* The vector *π* is the right eigenvector of **P**, scaled to sum to 1. Applying Lemma 1, and noting that *<sup>λ</sup>* <sup>=</sup> 1 and **<sup>1</sup>**T**<sup>P</sup>** <sup>=</sup> **<sup>1</sup>**T, gives *<sup>d</sup><sup>π</sup>* <sup>=</sup> **Z** *<sup>π</sup>*<sup>T</sup> <sup>⊗</sup> **<sup>I</sup>***<sup>s</sup>* <sup>−</sup> *<sup>π</sup>***1**T *<sup>d</sup>*vec **<sup>P</sup>**. Noting that **<sup>Z</sup>***<sup>π</sup>* <sup>=</sup> *<sup>π</sup>* and simplifying the Kronecker products yields (11.55).

Based on an analysis of eigenvector sensitivity (Meyer and Stewart 1982), Golub and Meyer (1986) derived an expression for the derivative of *π* to a change in a single element of **P** using the group generalized inverse *(***I** − **P***)* # of **<sup>I</sup>** <sup>−</sup> **<sup>P</sup>**. Since *(***I** − **P***)* # <sup>=</sup> **<sup>Z</sup>** <sup>−</sup> *<sup>π</sup>***1**<sup>T</sup> (Golub and Meyer 1986), expression (11.55) is exactly the Golub-Meyer result expressed in matrix calculus notation. Our results here permit sensitivity analysis of functions of *π* using only the chain rule. If *g(π)* is a vectoror scalar-valued function of *π*, then

$$d\lg(\pi) = \frac{d\mathbf{g}}{d\pi^{\mathsf{T}}} \, \frac{d\pi}{d\mathrm{vec}^{\mathsf{T}}\mathbf{P}} \, d\mathrm{vec}\,\mathbf{P} \tag{11.56}$$

Some examples will appear in Sect. 11.5.

#### *11.4.2 The Fundamental Matrix*

The fundamental matrix **Z** = **<sup>I</sup>** <sup>−</sup> **<sup>P</sup>** <sup>+</sup> *<sup>π</sup>***1**T−<sup>1</sup> plays a role in ergodic chains similar to that played by **N**<sup>1</sup> in absorbing chains (Kemeny and Snell 1960). It has been extended using generalized inverses (Meyer 1975; Kemeny 1981), but we do not consider those extensions here.

**Theorem 11.4.2** *The sensitivity of the fundamental matrix is*

$$d\operatorname{vec}\mathbf{Z} = \left(\mathbf{Z}^{\mathsf{T}} \otimes \mathbf{Z}\right) \left| \left[ \mathbf{I}\_{s^2} - \left[ \mathbf{1}\pi^{\mathsf{T}} \otimes \left( \mathbf{Z} - \pi \mathbf{1}^{\mathsf{T}} \right) \right] \right] d\operatorname{vec}\mathbf{P} \right.\tag{11.57}$$

*Proof* From (2.82),

$$d\text{vec}\,\mathbf{Z} = -\left(\mathbf{Z}^{\mathsf{T}} \otimes \mathbf{Z}\right)d\text{vec}\,\left(\mathbf{I} - \mathbf{P} + \pi\mathbf{1}^{\mathsf{T}}\right) \tag{11.58}$$

$$= \left(\mathbf{Z}^{\mathsf{T}} \otimes \mathbf{Z}\right) \left(d \mathbf{vec} \, \mathbf{P} - (\mathbf{1} \otimes \mathbf{I}\_s) \, d\mathbf{\pi}\right) \tag{11.59}$$

Substituting (11.55) for *dπ* and simplifying gives (11.57).

#### *11.4.3 The First Passage Time Matrix*

Let **R** = *rij* be the matrix of mean first passage times from *j* to *i*, given by Iosifescu (1980, Thm. 4.7).

$$\mathbf{R} = \mathcal{D} \left( \boldsymbol{\pi} \right)^{-1} \left( \mathbf{I} - \mathbf{Z} + \mathbf{Z}\_{\text{dg}} \mathbf{E} \right) . \tag{11.60}$$

Again, this is the transpose of the expression obtained when **P** is row-stochastic.

**Theorem 11.4.3** *The sensitivity of* **R** *is*

$$\begin{aligned} d\text{vec}\,\mathbf{R} &= -\left[\mathbf{R}^{\mathsf{T}} \otimes \mathcal{D}\,(\mathfrak{x})^{-1}\right] \mathcal{D}\,(\text{vec}\,\mathbf{I}\_{s}) \,(\mathbf{1} \otimes \mathbf{I}\_{s}) \,d\mathfrak{x} \\\\ &- \left\{ \left[\mathbf{I}\_{s} \otimes \mathcal{D}\,(\mathfrak{x})^{-1}\right] - \left[\mathbf{E} \otimes \mathcal{D}\,(\mathfrak{x})^{-1}\right] \mathcal{D}\,(\text{vec}\,\mathbf{I}\_{s}) \right\} d\text{vec}\,\mathbf{Z} \quad (11.61) \end{aligned}$$

*where dπ is given by* (11.55) *and d*vec **Z** *is given by* (11.57)*.*

*Proof* Differentiating (11.60) gives

$$d\mathbf{R} = d\left[\mathcal{D}\left(\boldsymbol{\pi}\right)^{-1}\right] \left(\mathbf{I} - \mathbf{Z} + \mathbf{Z\_{dg}}\mathbf{E}\right) + \mathcal{D}\left(\boldsymbol{\pi}\right)^{-1} \left[-d\mathbf{Z} + \left(d\mathbf{Z\_{dg}}\right)\mathbf{E}\right].\tag{11.62}$$

Applying the vec operator gives

$$\begin{aligned} d\operatorname{vec}\mathbf{R} &= \left[ \left( \mathbf{I} - \mathbf{Z} + \mathbf{Z}\_{\mathrm{d\underline{\mathbf{g}}}} \mathbf{E} \right)^{\mathsf{T}} \otimes \mathbf{I}\_{\mathrm{s}} \right] d\operatorname{vec} \left[ \mathcal{D} \left( \boldsymbol{\pi} \right)^{-1} \right] \\\\ &- \left[ \mathbf{I}\_{\mathrm{s}} \otimes \mathcal{D} \left( \boldsymbol{\pi} \right)^{-1} \right] d\operatorname{vec} \mathbf{Z} + \left[ \mathbf{E} \otimes \mathcal{D} \left( \boldsymbol{\pi} \right)^{-1} \right] d\operatorname{vec} \mathbf{Z}\_{\mathrm{d\underline{\mathbf{g}}}}. \end{aligned} \text{(11.63)}$$

Using (2.82) for *d*vec <sup>D</sup> *(π)*−<sup>1</sup> , (2.69) for *d*vec D *(π)*, and (11.12) for *d*vec **Z**dg yields

$$d\text{vec}\,\mathbf{R} = -\left[\mathbf{R}^{\mathsf{T}}\mathcal{D}\left(\boldsymbol{\pi}\right)\otimes\mathbf{I}\_{s}\right]\left[\mathcal{D}\left(\boldsymbol{\pi}\right)^{-1}\otimes\mathcal{D}\left(\boldsymbol{\pi}\right)^{-1}\right]\mathcal{D}\left(\text{vec}\,\mathbf{I}\right)\left(\mathbf{1}\otimes\mathbf{I}\right)d\boldsymbol{\pi}$$

$$-\left[\mathbf{I}\otimes\mathcal{D}\left(\boldsymbol{\pi}\right)^{-1}\right]d\text{vec}\,\mathbf{Z} + \left[\mathbf{E}\otimes\mathcal{D}\left(\boldsymbol{\pi}\right)^{-1}\right]\mathcal{D}\left(\text{vec}\,\mathbf{I}\right)d\text{vec}\,\mathbf{Z}\,\left(\text{11.64}\right)$$

which simplifies to give (11.61).

$$\text{The first-order coupling between the two-dimensional } \mathcal{N} \text{-matrices is the only possible } \mathcal{N} \text{-matrices with } \mathcal{N} = \{0, 1, 2, \dots, N\} \text{ and } \mathcal{N} = \{0, 1, 2, \dots, N\}.$$

#### *11.4.4 Mixing Time and the Kemeny Constant*

The mixing time *K* of a chain is the mean time required to get from a specified state to a state chosen at random from the stationary distribution *π*. Remarkably, *K* is independent of the starting state (Grinstead and Snell 2003; Hunter 2006) and is sometimes called Kemeny's constant; it is a measure of the rate of convergence to stationarity, and is *K* = trace*(***Z***)* (Hunter 2006). In addition to being a quantity of interest in itself, the rate of convergence also plays a role in the sensitivity of the stationary distribution of ergodic chains (Hunter 2005; Mitrophanov 2005).

**Theorem 11.4.4** *The sensitivity of K is*

$$dK = (\mathbf{vec} \, \mathbf{I}\_s)^\mathsf{T} d \mathbf{vec} \, \mathbf{Z}. \tag{11.65}$$

*Proof* Differentiating *K* = trace*(***Z***)* gives

$$dK = \mathbf{1}^{\mathsf{T}} \left(\mathbf{I} \diamond d\mathbf{Z}\right) \mathbf{1}.\tag{11.66}$$

Applying the vec operator gives

$$dK = \left(\mathbf{1}^{\mathsf{T}} \otimes \mathbf{1}^{\mathsf{T}}\right) \mathcal{D}\left(\mathbf{vec} \,\mathbf{I}\right) d\mathbf{vec} \,\mathbf{z} \,\mathbf{Z} \tag{11.67}$$

which simplifies to (11.65).

#### *11.4.5 Implicit Parameters and Compensation*

Theorems 11.4.1, 11.4.2, 11.4.3, and 11.4.4 are written in terms of *d*vec **P**. However, perturbation of any element, say *pkj* , to *pkj* + *θkj* , must be compensated for by adjustments of the other elements in column *j* so that the column sum remains equal to 1 (Conlisk 1985). Two kinds of compensation are likely to be of use in applications: additive and proportional. Additive compensation adjusts all the elements of the column by an equal amount, distributing the perturbation *θkj* additively over column *j* . Proportional compensation distributes *θkj* in proportion to the values of the *pij* , for *i* = *k*. Proportional compensation is attractive because it preserves the pattern of zero and non-zero elements within **P**.

To develop the compensation formulae, let us start by considering a probability vector **p**, of dimension *s* ×1, with *pi* ≥ 0 and / *<sup>i</sup> pi* = 1. Let *θi* be the perturbation of *pi*, and write

$$\mathbf{p}(\theta) = \mathbf{p}(0) + \mathbf{A}\theta \tag{11.68}$$

for some matrix **A** to be determined. If *y* is a function of **p**, then

$$d\mathbf{y} = \frac{d\mathbf{y}}{d\mathbf{p}^{\mathsf{T}}} \frac{d\mathbf{p}}{d\boldsymbol{\theta}^{\mathsf{T}}} \, d\boldsymbol{\theta} \tag{11.69}$$

evaluated at *θ* = 0.

**Additive compensation** For the case of additive compensation, we write

$$\begin{aligned} p\_1(\theta) &= p\_1(0) + \theta\_1 - \frac{\theta\_2}{s - 1} - \dots - \frac{\theta\_s}{s - 1} \\ p\_2(\theta) &= p\_2(0) - \frac{\theta\_1}{s - 1} + \theta\_2 - \dots - \frac{\theta\_s}{s - 1} \\ &\vdots \\ p\_s(\theta) &= p\_s(0) - \frac{\theta\_1}{s - 1} - \frac{\theta\_2}{s - 1} - \dots + \theta\_s \end{aligned} \tag{11.70}$$

The perturbation *θ*<sup>1</sup> is added to *p*<sup>1</sup> and compensated for by subtracting *θ*1*/(s* − 1*)* from all other entries of **p**; clearly / *<sup>i</sup> pi(θ)* = 1 for any perturbation vector *θ*.

The system of Eqs. (11.70) can be written

$$\mathbf{p}(\theta) = \mathbf{p}(0) + \left(\mathbf{I} - \frac{1}{s - 1}\mathbf{C}\right)\theta. \tag{11.71}$$

Defining **E** to be a matrix of ones, then the matrix **C** can be written (as a so-called Toeplitz matrix) as **C** = **E**−**I**, with zeros on the diagonal and ones elsewhere. Thus the matrix **A** in (11.68) is

$$\mathbf{A} = \mathbf{I} - \frac{1}{s - 1}\mathbf{C} \tag{11.72}$$

**Proportional compensation** For proportional compensation, assume that *pi <* 1 for all *i*. The vector **p***(θ)* is

$$\begin{aligned} p\_1(\theta) &= p\_1(0) + \theta\_1 - \frac{p\_1\theta\_2}{1 - p\_2} - \dots - \frac{p\_1\theta\_s}{1 - p\_s} \\\\ p\_2(\theta) &= p\_2(0) - \frac{p\_2\theta\_1}{1 - p\_1} + \theta\_2 - \dots - \frac{p\_2\theta\_s}{1 - p\_s} \\\\ &\vdots \\\\ p\_s(\theta) &= p\_s(0) - \frac{p\_s\theta\_1}{1 - p\_1} - \frac{p\_s\theta\_2}{1 - p\_2} - \dots + \theta\_s \end{aligned} \tag{11.73}$$

The perturbation *θ*<sup>1</sup> is added to *p*<sup>1</sup> and compensated for by subtracting *θ*1*pi/(*1−*p*1*)* from the *i*th entry of **p**. Again, / *<sup>i</sup> pi(θ)* = 1 for any perturbation vector *θ*.

Equation (11.73) can be written

$$\mathbf{p}(\theta) = \mathbf{p}(0) + \left[\mathbf{I} - \mathcal{D}\left(\mathbf{p}\right)\mathbf{C}\,\mathcal{D}\left(\mathbf{1} - \mathbf{p}\right)^{-1}\right]\theta \tag{11.74}$$

so that the matrix **A** in (11.68) is

$$\mathbf{A} = \mathbf{I} - \mathcal{D}\left(\mathbf{p}\right) \mathbf{C} \mathcal{D}\left(\mathbf{1} - \mathbf{p}\right)^{-1} \tag{11.75}$$

**The transition matrix** We have derived compensation formulae for a single probability vector **p**. Now consider perturbation of a probability matrix **P**, each column of which is a probability vector. Define a perturbation matrix where *θij* is the perturbation of *pij* . Perturbations of column *j* are to be compensated by a matrix **A***<sup>j</sup>* , so that

$$\mathbf{P}(\Theta) = \mathbf{P}(0) + \left[ \mathbf{A}\_1 \Theta(:, \mathbf{l}) \cdots \mathbf{A}\_3 \Theta(:, \mathbf{s}) \right] \tag{11.76}$$

where **A***<sup>i</sup>* compensates for the changes in column *i* of **P**. Applying the vec operator to (11.76) gives

$$\operatorname{vec}\mathbf{P}(\boldsymbol{\Theta}) = \operatorname{vec}\mathbf{P}(0) + \begin{pmatrix} \mathbf{A}\_{\parallel} & & \\ & \ddots & \\ & & \mathbf{A}\_{\delta} \end{pmatrix} \operatorname{vec}\boldsymbol{\Theta} \tag{11.77}$$

$$\mathbf{f} = \text{vec}\,\mathbf{P}(0) + \sum\_{i=1}^{s} \left(\mathbf{E}\_{li} \otimes \mathbf{A}\_{i}\right) \text{vec}\,\mathbf{\Theta}.\tag{11.78}$$

The terms in the summation in (11.78) are recognizable as the vec of the product **A***<sup>i</sup>***E***ii*; thus

$$\mathbf{P}(\Theta) = \mathbf{P}(0) + \sum\_{i=1}^{s} \mathbf{A}\_{i} \, \Theta \mathbf{E}\_{li} \tag{11.79}$$

where **E***ii* is a matrix with a 1 in the *(i, i)* entry and zeros elsewhere.

**Theorem 11.4.5** *Let* **P** *be a column-stochastic s* × *s transition matrix. Let be a matrix of perturbations, where θij is applied to pij , and the other entries of compensate for the perturbation. Let* **C** = **E** − **I***. If compensation is additive, then*

$$\mathbf{P}(\Theta) = \mathbf{P}(0) + \left(\mathbf{I} - \frac{1}{s - 1}\mathbf{C}\right)\Theta \tag{11.80}$$

270 11 Sensitivity Analysis of Discrete Markov Chains

$$\frac{d\mathbf{vec}\,\mathbf{P}}{d\mathbf{vec}\,\mathbf{\tilde{\mathbf{\mathbf{\bar{P}}}}\,\mathbf{\tilde{\mathbf{\Theta}}}} = \left[\mathbf{I}\_{s^2} - \frac{1}{s-1} \left(\mathbf{I}\_s \otimes \mathbf{C}\right)\right].\tag{11.81}$$

*If compensation is proportional, then*

$$\mathbf{P}(\boldsymbol{\Theta}) = \mathbf{P}(0) + \sum\_{l=1}^{s} \left\{ \mathbf{I} - \mathcal{D}\left[\mathbf{P}(:,i)\right] \mathbf{C} \,\mathcal{D}\left[1 - \mathbf{P}(:,i)\right]^{-1} \right\} \boldsymbol{\Theta} \mathbf{E}\_{il} \quad (11.82)$$

$$\frac{d\text{vec}\,\mathbf{P}}{d\text{vec}\,\,^\mathsf{T}\Theta} = \mathbf{I}\_{s^2} - \sum\_{l=1}^{s} \left\{ \mathbf{E}\_{li} \otimes \mathcal{D}\,\,\left[\mathbf{P}(:,i)\right] \,\,\mathbf{C}\,\,\mathcal{D}\,\,\left[1-\mathbf{P}(:,i)\right] \right\}.\tag{11.83}$$

*Proof* **P***()* is given by (11.79). If compensation is additive, **A***<sup>i</sup>* is given by (11.72) for all *i*. Substituting into (11.79) gives (11.80). Differentiating (11.80) and applying the vec operator gives (11.81).

If compensation is proportional, substituting (11.75) for **A***<sup>i</sup>* in (11.79) gives (11.82). Differentiating yields

$$d\mathbf{P} = (d\theta) \sum\_{l=1}^{s} \mathbf{E}\_{il} - \sum\_{l=1}^{s} \mathcal{D}\left[\mathbf{P}(:,1)\right] \mathbf{C} \,\mathcal{D}\left[\mathbf{1} - \mathbf{P}(:,i)\right]^{-1} (d\Theta) \mathbf{E}\_{il} \,\mathrm{.}\tag{11.84}$$

Using the vec operator gives (11.83).

Perturbations of **P** subject to compensation are given by perturbations of . Thus for any function *y(***P***)* we can write

$$\left.\frac{d\mathbf{y}}{d\mathbf{vec}\,^\mathsf{T}\mathbf{P}}\right|\_{\mathrm{comp}} = \frac{d\mathbf{y}}{d\mathbf{vec}\,^\mathsf{T}\mathbf{P}}\,\frac{d\mathbf{vec}\,\mathbf{P}}{d\mathbf{vec}\,^\mathsf{T}\mathbf{O}}\tag{11.85}$$

where *d*vec **P***/d*vec <sup>T</sup> is given (for additive and proportional compensation) by Theorem 11.4.5. The slight notational complexity is worthwhile for clarifying how to use Theorem 11.4.5 in practice.

#### **11.5 Species Succession in a Marine Community**

Markov chains are used by ecologists as models of species replacement (succession) in ecological communities; (e.g., Horn 1975; Hill et al. 2004; Nelis and Wootton 2010). In these models, the state of a point on a landscape is given by the species occupying that point. The entry *pij* of **P** is the probability that species *j* is replaced by species *i* between *t* and *t* +1. If a community consists of a large number of points independently subject to the transition probabilities in **P**, the stationary distribution *π* will give the relative frequencies of species in the community at equilibrium.

$$\mathbb{T}$$

Hill et al. (2004) used a Markov chain to describe a community of encrusting organisms occupying rock surfaces at 30–35 m depth in the Gulf of Maine. The Markov chain contained 14 species plus an additional state ("bare rock") for unoccupied substrate. The matrix **P** was estimated from longitudinal data (Hill et al. 2002, 2004) and is given, along with a list of species names, in Appendix B. We will use the results of this chapter to analyze the sensitivity of species diversity and the Kemeny constant to the processes of colonization and replacement that determing **P**.

#### *11.5.1 Biotic Diversity*

The stationary distribution *π*, with the species numbered in order of decreasing abundance and bare rock placed at the end as state 15, is shown in Fig. 11.2. The two dominant species are an encrusting sponge (called *Hymedesmia*) and a bryozoan (*Crisia*).

The entropy of this stationary distribution, *H (π)* = −*π*T*(*log *<sup>π</sup>)*, where the logarithm is applied elementwise, is used as an index of biodiversity; it is maximal when all species are equally abundant and goes to 0 in a community dominated by a single species. The sensitivity of *H* is

$$dH = -\left(\log|\mathfrak{n}^{\mathsf{T}} + \mathbf{1}^{\mathsf{T}}\right)d\mathfrak{n}\tag{11.86}$$

Most ecologists, however, would not include bare substrate in a measure of biodiversity, so we define instead a "biotic diversity" *Hb(π)* = *H (πb)* where

$$
\pi\_b = \frac{\mathbf{G}\pi}{\|\mathbf{G}\pi\|}. \tag{11.87}
$$

**Fig. 11.2** The stationary distribution for the subtidal benthic community succession model of Hill et al. (2004). States 1–14 correspond to species, numbered in decreasing order of abundance in the stationary distribution. State 15 is bare rock, unoccupied by any species. For the identity of species and the transition matrix, see Appendix B

The matrix **G**, of dimension 14 × 15, is a 0–1 matrix that selects rows 1–14 of *π*. Because *<sup>π</sup>* is positive, **G***π* = **<sup>1</sup>**T**G***π.* Differentiating *<sup>π</sup><sup>b</sup>* gives

$$d\pi\_b = \left(\frac{\mathbf{G}}{\mathbf{1}^\mathsf{T}\mathbf{G}\pi} - \frac{\mathbf{G}\pi\mathbf{1}^\mathsf{T}\mathbf{G}}{\left(\mathbf{1}^\mathsf{T}\mathbf{G}\pi\right)^2}\right)d\pi\tag{11.88}$$

which simplifies to

$$d\pi\_b = \left(\frac{\mathbf{G} - \pi\_b \mathbf{1}^\mathsf{T} \mathbf{G}}{\mathbf{1}^\mathsf{T} \mathbf{G} \pi}\right) d\pi \tag{11.89}$$

This model contains no explicit parameters; perturbations of the transition probabilities themselves are of interest and a compensation pattern is needed. Because the relative magnitudes of the entries in a column of **P** reflect the relative abilities of species to capture or to hold space, proportional compensation is appropriate in this case because it preserves these relative abilities.

The sensitivity and elasticity of the biotic diversity *Hb* to changes in the matrix **P**, subject to proportional compensation, are

$$\left. \frac{dH\_b}{d\text{vec}^\mathsf{T}\mathbf{P}} \right|\_{\text{comp}} = \underbrace{\frac{dH\_b}{d\pi\_b^\mathsf{T}}}\_{1} \underbrace{\frac{d\pi\_b}{d\mathfrak{m}^\mathsf{T}}}\_{2} \underbrace{\frac{d\mathfrak{m}}{d\text{vec}^\mathsf{T}\mathbf{P}}}\_{3} \underbrace{\frac{d\text{vec}\,\mathbf{P}}{d\text{vec}^\mathsf{T}\Theta}}\_{4} \tag{11.90}$$

$$\left. \frac{\epsilon \, H\_b}{\epsilon \, \text{vec}^{\mathsf{T}} \mathbf{P}} \right|\_{\text{comp}} = \frac{1}{H\_b} \left. \frac{dH\_b}{d \text{vec}^{\mathsf{T}} \mathbf{P}} \right| \mathcal{D} \, (\text{vec} \, \mathbf{P}) \tag{11.91}$$

Term 1 on the right hand side of (11.90) is the derivative of *Hb* with respect to *πb*, and is given by (11.86). Term 2 is the derivative of the biotic diversity vector *π<sup>b</sup>* with respect to the full diversity vector *π*, given by (11.89). Term 3 is the derivative of the diversity vector *π* with respect to the transition matrix **P**, given by, (11.55). Finally, Term 4 is the derivative of the matrix **P** taking into account the compensation structure in (11.83).

The sensitivity and elasticity vectors (11.90) and (11.91) are of dimension <sup>1</sup> <sup>×</sup> *<sup>s</sup>*<sup>2</sup> <sup>=</sup> <sup>1</sup> <sup>×</sup> 255. To reduce the number of independent perturbations, we consider subsets of the *pij* : disturbance (in which a species is replaced by bare rock), colonization of unoccupied space, replacement of one species by another, and persistence of a species in its location, where

> *P*[disturbance of sp. *i*] = *psi P*[colonization by sp. *i*] = *pis P*[persistence of sp. *i*] = *pii*

**Fig. 11.3** The elasticity of the biotic diversity *Hb(π)* calculated over the biotic states of the stationary distribution of the subtidal benthic community succession model of Hill et al. (2004). States 1–14 correspond to species, numbered in decreasing order of abundance in the stationary distribution. State 15 is bare rock, unoccupied by any species. For the identity of species and the transition matrix, see Appendix B

$$P\text{[repplacement of sp. i]} = \sum\_{k \neq i, s} p\_{ki}$$

$$P\text{[repplacement by sp. i]} = \sum\_{j \neq i, s} p\_{ij} \cdot p\_{ij}$$

Extracting the corresponding elements of *Hb* vec <sup>T</sup>**<sup>P</sup>** gives the elasticities to these classes of probabilities. Figure 11.3 shows that the dominant species (1 and 2) have impacts that are larger than, and opposite in sign to, those of the remaining species. Biodiversity would be enhanced by increasing the disturbance of, or the replacement of, species 1 and 2, and reduced by increasing the rates of colonization by, persistence of, or replacement by species 1 and 2.

#### *11.5.2 The Kemeny Constant and Ecological Mixing*

Ecologists have used several measures of the rate of convergence of communities modelled by Markov chains, including the damping ratio and Dobrushin's coefficient of ergodicity (Hill et al. 2004). The Kemeny constant *K* is an interesting addition to this list; it gives the expected time to get from any initial state to

**Fig. 11.4** The sensitivity of the Kemeny constant *K* of the subtidal benthic community succession model of Hill et al. (2004). States 1–14 correspond to species, numbered in decreasing order of abundance in the stationary distribution. State 15 is bare rock, unoccupied by any species. For the identity of species and the transition matrix, see Appendix B

a state selected at random from the stationary distribution (Hunter 2006). Once reaching that state, the behavior of the chain and the stationary process are indistinguishable.

The sensitivity of *K*, subject to compensation, is

$$\left. \frac{d\boldsymbol{K}}{d\mathbf{vec}\,^{\mathsf{T}}\mathbf{P}} \right|\_{\mathrm{comp}} = \frac{d\boldsymbol{K}}{d\mathbf{vec}\,^{\mathsf{T}}\mathbf{Z}} \frac{d\mathbf{vec}\,\mathbf{Z}}{d\mathbf{vec}\,^{\mathsf{T}}\mathbf{P}} \frac{d\mathbf{vec}\,\mathbf{P}}{d\mathbf{vec}\,^{\mathsf{T}}\boldsymbol{\Theta}} \tag{11.92}$$

where the three terms on the right hand side are given by (11.65), (11.57), and (11.83), respectively.

Figure 11.4 shows the sensitivities *dK/d*vec <sup>T</sup>**P**, subject to proportional compensation, and aggregated as in Fig. 11.3. Unlike the case with *Hb*, the two dominant species do not stand out from the others. Increases in the rates of replacement will speed up convergence, and increases in persistence will slow convergence. The disturbance of, colonization by, persistence of, and replacement of species 6 (it is a sea anemone, *Urticina crassicornis*) have particularly large impacts on *K*. Examination of row 6 and column 6 of **P** (Appendix B) shows that *U. crassicornis* has the highest probability of persistence (*p*<sup>66</sup> = 0*.*86), and one of the lowest rates of disturbance, in the community. While it is far from dominant (Fig. 11.2), it has a major impact on the rate of mixing.

#### **11.6 Discussion**

Given that many properties of finite state Markov chains can be expressed as simple matrix expressions, matrix calculus is an attractive approach to finding the sensitivity and elasticity to parameter perturbations. Most of the literature on perturbation analysis of Markov chains has focused on the stationary distribution of ergodic chains, but the approach here is equally applicable to absorbing chains, and to dependent variables other than the stationary distribution. The perturbation of ergodic chains is often studied using generalized inverses, since the influential studies of Meyer (Meyer 1975, 1994; Golub and Meyer 1986; Funderlic and Meyer 1986). Matrix calculus provides a complementary approach; the sensitivity of the stationary distribution *π* obtained here agrees with the result obtained by Golub and Meyer (1986) using the group generalized inverse.

The examples shown here are typical of cases where absorbing or ergodic Markov chains are used in population biology and ecology. In each example, the dependent variables of interest are functions several steps removed from the chain itself. The ease with which one can differentiate such functions is a particularly attractive property of the matrix calculus approach.

#### **A Appendix A: Proofs**

Theorems 11.2.1 and 11.2.2 give the sensitivities of the moments of the number of visits to transient states and of the time to absorption, respectively. These results are obtained by applying matrix calculus to the expressions for the moments. Proofs are given in the text for the first two moments; the proofs for the others follow the same steps but introduce no new concepts, and so are presented here.

#### *A.1 Derivatives of the Moments of Occupancy Times*

To continue the proof of Theorem 11.2.1, take partial differentials of **N**<sup>3</sup> in (11.5) with respect to **N**<sup>1</sup> and **N**dg, to obtain

$$
\delta\partial\_{\mathbf{N}\_{\mathrm{I}}}\mathbf{N}\_{\mathrm{J}} = \left(6\mathbf{N}\_{\mathrm{dg}}^2 - 6\mathbf{N}\_{\mathrm{dg}} + \mathbf{I}\right)d\mathbf{N}\_{\mathrm{I}}\tag{11.93}
$$

$$\partial\_{\mathbf{N\_{dg}}} \mathbf{N\_{\S}} = 6 \left( d\mathbf{N\_{dg}} \right) \mathbf{N\_{dg}} \mathbf{N\_{l}} + 6 \mathbf{N\_{dg}} \left( d\mathbf{N\_{dg}} \right) \mathbf{N\_{l}} - 6 \left( d\mathbf{N\_{dg}} \right) \mathbf{N\_{l}} \qquad (11.94)$$

Applying the vec operator to each term and using Roth's theorem gives

$$
\partial\_{\mathbf{N}\_{\mathrm{l}}} \operatorname{vec} \mathbf{N}\_{\mathrm{3}} = \left[ \mathbf{I} \otimes \left( 6 \mathbf{N}\_{\mathrm{dg}}^2 - 6 \mathbf{N}\_{\mathrm{dg}} + \mathbf{I} \right) \right] d \mathbf{vec} \, \mathbf{N}\_{\mathrm{l}} \tag{11.95}
$$

$$
\partial\_{\mathbf{N\_{dg}}} \text{vec } \mathbf{N\_{3}} = \left[ 6 \left( \mathbf{N\_{l}^{\mathsf{T}}} \mathbf{N\_{dg}} \otimes \mathbf{I} \right) + 6 \left( \mathbf{N^{\mathsf{T}}} \otimes \mathbf{N\_{dg}} \right) - 6 \left( \mathbf{N\_{l}^{\mathsf{T}}} \otimes \mathbf{I} \right) \right] \tag{11.96}
$$

$$
d \text{vec } \mathbf{N\_{dg}} . \tag{11.96}
$$

Substituting (11.95) and (11.96) into (11.13) gives (11.9).

Taking partial differentials of **N**<sup>4</sup> in (11.6) gives

$$\boldsymbol{\partial\_{\rm N\_l}N\_4} = \left(24\mathbf{N\_{dg}^3} - 36\mathbf{N\_{dg}^2} + 14\mathbf{N\_{dg}} - \mathbf{I}\right)d\mathbf{N\_l} \tag{11.97}$$

$$
\partial\_{\mathbf{N\_{dg}}} \mathbf{N\_{4}} = 24 \left( d\mathbf{N\_{dg}} \right) \mathbf{N\_{dg}^{2}} \mathbf{N\_{l}} + 24 \mathbf{N\_{dg}} \left( d\mathbf{N\_{dg}} \right) \mathbf{N\_{dg}} \mathbf{N\_{l}} + 24 \mathbf{N\_{dg}^{2}} \left( d\mathbf{N\_{dg}} \right) \mathbf{N\_{l}}
$$

$$
$$

Applying the vec operator yields

$$
\partial\_{\mathbf{N}\_{\mathrm{I}}} \operatorname{vec} \mathbf{N}\_{\mathrm{A}} = \left[ \mathbf{I} \otimes \left( 24 \mathbf{N}\_{\mathrm{dg}}^{3} - 36 \mathbf{N}\_{\mathrm{dg}}^{2} + 14 \mathbf{N}\_{\mathrm{dg}} - \mathbf{I} \right) \right] d \operatorname{vec} \mathbf{N}\_{\mathrm{I}} \tag{11.99}
$$

$$
\partial\_{\mathbf{N}\_{\mathrm{dg}}} \operatorname{vec} \mathbf{N}\_{\mathrm{A}} = \left[ 24 \left( \mathbf{N}\_{\mathrm{I}}^{\mathrm{T}} \mathbf{N}\_{\mathrm{dg}}^{2} \otimes \mathbf{I} \right) + 24 \left( \mathbf{N}\_{\mathrm{I}}^{\mathrm{T}} \mathbf{N}\_{\mathrm{dg}} \otimes \mathbf{N}\_{\mathrm{dg}} \right) + 24 \left( \mathbf{N}\_{\mathrm{I}}^{\mathrm{T}} \otimes \mathbf{N}\_{\mathrm{dg}}^{2} \right) \right. \\
$$

$$
$$

$$
+14 \left( \mathbf{N}\_{\mathrm{I}}^{\mathrm{T}} \otimes \mathbf{I} \right) \operatorname{dvec} \mathbf{N}\_{\mathrm{dg}} \,. \tag{11.100}
$$

Substituting (11.99) and (11.100) into (11.13) gives (11.10).

#### *A.2 Derivatives of the Moments of Time to Absorption*

To continue the proof of Theorem 11.2.2, take partial differentials of *η*3, in (11.21) with respect to *η*<sup>1</sup> and **N**1, to obtain

$$
\partial\_{\eta\_1} \eta\_3^\mathsf{T} = \left(d\eta\_1^\mathsf{T}\right) \left(6\mathbf{N}\_{\mathrm{l}}^2 - 6\mathbf{N}\_{\mathrm{l}} + \mathbf{I}\right) \tag{11.101}
$$

$$
\partial \mathbf{N}\_{\mathbf{l}} \boldsymbol{\eta}\_{3}^{\mathsf{T}} = 6 \boldsymbol{\eta}\_{1}^{\mathsf{T}} \left( d \mathbf{N}\_{\mathbf{l}} \right) + 6 \boldsymbol{\eta}\_{1}^{\mathsf{T}} \mathbf{N}\_{\mathbf{l}} \left( d \mathbf{N}\_{\mathbf{l}} \right) - 6 \boldsymbol{\eta}\_{1}^{\mathsf{T}} \left( d \mathbf{N}\_{\mathbf{l}} \right) \,. \tag{11.102}$$

Applying the vec operator yields

$$
\partial\_{\eta\_1} \eta\_3 = \left(6\mathbf{N}\_1^2 - 6\mathbf{N}\_1 + \mathbf{I}\right)^\dagger d\eta\_1 \tag{11.103}
$$

$$\partial\_{\mathbf{N}\_{\parallel}} \boldsymbol{\eta}\_{\mathcal{I}} = \left[ 6 \left( \mathbf{N}\_{\parallel}^{\mathsf{T}} \otimes \boldsymbol{\eta}\_{\perp}^{\mathsf{T}} \right) + 6 \left( \mathbf{I} \otimes \boldsymbol{\eta}\_{\perp}^{\mathsf{T}} \mathbf{N}\_{\parallel} \right) - 6 \left( \mathbf{I} \otimes \boldsymbol{\eta}\_{\perp}^{\mathsf{T}} \right) \right] d \mathbf{vec} \, \mathbf{N}\_{\parallel} \ (11.104)$$

which combine to yield (11.25).

The partial differentials of *η*<sup>4</sup> in (11.22) with respect to *η*<sup>1</sup> and **N**<sup>1</sup> are

$$
\partial\_{\boldsymbol{\eta}\_{1}} \boldsymbol{\eta}\_{4}^{\mathsf{T}} = d \boldsymbol{\eta}\_{1}^{\mathsf{T}} \left( 24 \mathbf{N}\_{1}^{3} - 36 \mathbf{N}\_{1}^{2} + 14 \mathbf{N}\_{1} - \mathbf{I} \right) \tag{11.105}
$$

$$
\partial\_{\boldsymbol{\aleph}\_{1}} \boldsymbol{\eta}\_{4}^{\mathsf{T}} = \boldsymbol{\eta}\_{1}^{\mathsf{T}} \left[ 24 \left( d \mathbf{N}\_{1} \right) \mathbf{N}\_{1}^{2} + 24 \mathbf{N}\_{1} \left( d \mathbf{N}\_{1} \right) \mathbf{N}\_{1} + 24 \mathbf{N}\_{1}^{2} \left( d \mathbf{N}\_{1} \right) \right. \tag{11.106}
$$

$$
$$

Applying the vec operator to each equation gives

$$
\partial\_{\boldsymbol{\eta}\_{1}}\boldsymbol{\eta}\_{4} = \left(24\mathbf{N}\_{1}^{3} - 36\mathbf{N}\_{1}^{2} + 14\mathbf{N}\_{1} - \mathbf{I}\right)^{\mathsf{T}}d\boldsymbol{\eta}\_{1} \tag{11.107}
$$

$$
\partial\_{\boldsymbol{\eta}\_{1}}\boldsymbol{\eta}\_{4} = \left\{24\left[\left(\mathbf{N}\_{1}^{2}\right)^{\mathsf{T}}\otimes\boldsymbol{\eta}\_{1}^{\mathsf{T}}\right] + 24\left(\mathbf{N}\_{1}^{\mathsf{T}}\otimes\boldsymbol{\eta}\_{1}^{\mathsf{T}}\mathbf{N}\_{1}\right) + 24\left(\mathbf{I}\otimes\boldsymbol{\eta}\_{1}^{\mathsf{T}}\mathbf{N}\_{1}^{\mathsf{T}}\right)\right. \\
$$

$$
$$

which combine to give (11.26).

#### **B Appendix B: Marine Community Matrix**


The transition matrix for the marine benthic community (Hill et al. 2004) is

#### **P** =

⎛ ⎜ ⎜⎜ ⎜ ⎜⎜ ⎜ ⎜ ⎜⎜ ⎜ ⎜⎜ ⎜ ⎜⎜ ⎜ ⎜ ⎜⎜ ⎜ ⎜⎜ ⎜ ⎜⎜ ⎜ ⎜ ⎝ 0*.*771 0*.*145 0*.*052 0*.*017 0*.*117 0*.*009 0*.*241 0*.*199 0*.*056 0*.*309 0*.*056 0*.*025 0*.*321 0*.*158 0*.*101 0*.*102 0*.*609 0*.*061 0*.*054 0*.*218 0*.*024 0*.*223 0*.*235 0*.*147 0*.*228 0*.*222 0*.*068 0*.*179 0*.*448 0*.*320 0*.*017 0*.*031 0*.*710 0*.*006 0*.*035 0*.*012 0*.*051 0*.*038 0*.*026 0*.*031 0*.*028 0*.*018 0*.*023 0*.*018 0*.*025 0*.*004 0*.*011 0*.*004 0*.*839 0*.*004 0*.*000 0*.*016 0*.*018 0*.*011 0*.*010 0*.*008 0*.*030 0*.*000 0*.*018 0*.*009 0*.*015 0*.*028 0*.*020 0*.*005 0*.*404 0*.*016 0*.*080 0*.*089 0*.*020 0*.*027 0*.*036 0*.*016 0*.*063 0*.*085 0*.*062 0*.*001 0*.*005 0*.*004 0*.*000 0*.*008 0*.*863 0*.*024 0*.*007 0*.*006 0*.*006 0*.*000 0*.*000 0*.*000 0*.*006 0*.*005 0*.*018 0*.*022 0*.*008 0*.*004 0*.*033 0*.*001 0*.*105 0*.*044 0*.*011 0*.*042 0*.*025 0*.*010 0*.*030 0*.*030 0*.*048 0*.*012 0*.*025 0*.*008 0*.*006 0*.*032 0*.*007 0*.*041 0*.*154 0*.*026 0*.*031 0*.*020 0*.*016 0*.*020 0*.*018 0*.*034 0*.*002 0*.*011 0*.*025 0*.*008 0*.*013 0*.*016 0*.*014 0*.*015 0*.*586 0*.*010 0*.*007 0*.*004 0*.*003 0*.*018 0*.*013 0*.*014 0*.*015 0*.*003 0*.*004 0*.*007 0*.*003 0*.*033 0*.*027 0*.*021 0*.*165 0*.*007 0*.*003 0*.*020 0*.*030 0*.*031 0*.*003 0*.*012 0*.*005 0*.*006 0*.*006 0*.*004 0*.*025 0*.*016 0*.*006 0*.*013 0*.*507 0*.*001 0*.*017 0*.*006 0*.*017 0*.*002 0*.*008 0*.*007 0*.*011 0*.*005 0*.*007 0*.*005 0*.*020 0*.*005 0*.*008 0*.*002 0*.*537 0*.*000 0*.*006 0*.*017 0*.*005 0*.*005 0*.*002 0*.*000 0*.*006 0*.*000 0*.*014 0*.*009 0*.*001 0*.*012 0*.*005 0*.*003 0*.*248 0*.*000 0*.*011 0*.*003 0*.*004 0*.*008 0*.*003 0*.*005 0*.*000 0*.*012 0*.*009 0*.*005 0*.*006 0*.*003 0*.*003 0*.*000 0*.*030 0*.*013 0*.*029 0*.*069 0*.*084 0*.*036 0*.*108 0*.*036 0*.*115 0*.*122 0*.*074 0*.*104 0*.*076 0*.*266 0*.*076 0*.*127 0*.*294 ⎞ ⎟ ⎟⎟ ⎟ ⎟⎟ ⎟ ⎟ ⎟⎟ ⎟ ⎟⎟ ⎟ ⎟⎟ ⎟ ⎟ ⎟⎟ ⎟ ⎟⎟ ⎟ ⎟⎟ ⎟ ⎟ ⎠

(11.109)

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 12 Sensitivity Analysis of Continuous Markov Chains**

#### **12.1 Introduction**

When Markov chains are used as mathematical models of natural or social phenomena, the transition intensities or probabilities are usually defined in terms of parameters that are relevant to the scientific question at hand. Sensitivity analysis of such models is important because it quantifies the dependence of the model behavior on the parameters. This chapter presents sensitivity results for finite-state, continuous-time absorbing Markov chains, paralleling the approach for discretetime chains in Chap. 11. In absorbing chains, interest focuses on behavior prior to absorption (time spent in transient states and time to absorption) and on the probabilities of absorption in each absorbing state. Here we will derive formulae for the sensitivity and the elasticity (i.e., proportional sensitivity) of the moments of the time to absorption, the time spent in each transient state, and the number of visits to each transient state.

The most basic difference between discrete-time and continuous-time Markov chains is that the former are defined by transition probabilities, while the latter are defined by transition rates. This leads to differences in the structure of the matrices, but there is a nice parallelism in the results.

Perturbation analysis of Markov chains has a long history (Schweitzer 1968; Meyer 1975). Most of the literature, however, is devoted to discrete-time chains, and most of that focuses on ergodic chains and the perturbation analysis of the stationary distribution; e.g. Funderlic and Meyer (1986), Golub and Meyer (1986), Hunter (2005), Cho and Meyer (2000), and Seneta (1993). Much less attention has been paid to continuous-time chains. Perturbation expansions have been developed

Chapter 12 is modified, by permission of John Wiley and Sons, from: Caswell, H. 2012. Perturbation analysis of continuous-time absorbing Markov chains. Numerical Linear Algebra with Applications 18:901-917. ©John Wiley and Sons.

for the stationary distribution of ergodic continuous-time chains, with application to queueing models (Altman et al. 2004), and sensitivity results and perturbation bounds presented for transient solutions (Ramesh and Trivedi 1993; Mitrophanov 2004). The operations research literature contains many studies of the sensitivity of performance measures calculated over realizations of a continuous-time ergodic Markov chain; e.g., Cao (1989), Glasserman (1992), and Cao et al. (1996). The results to be presented here complement and extend the existing literature on perturbation analysis of Markov chains, by focusing on the statistical properties of the solutions of absorbing continuous-time chains, by introducing the use of matrix calculus, and (as a consequence of that technique) extending the range of parameters whose effects can be evaluated.

#### *12.1.1 Absorbing Markov Chains*

I consider a finite state, homogeneous, continuous-time Markov chain with intensity matrix **Q**, where *qij* is the rate of transition from stage *j* to stage *i*. The intensity matrix satisfies *qij* ≥ 0 for *i* = *j* and *qjj* = −/ *<sup>i</sup>*=*<sup>j</sup> qij* . Note that **<sup>Q</sup>** is written in column-to-row orientation, and operates on column vectors. An absorbing chain contains at least one absorbing class of states. Numbering the states so that the transient states appear before the absorbing states leads to the intensity matrix

$$\mathbf{Q} = \left(\frac{\mathbf{U} \mid \mathbf{0}}{\mathbf{M} \mid \mathbf{0}}\right). \tag{12.1}$$

The matrix **U** contains rates of transitions among the transient states, and **M** contains the rates of transition from transient to absorbing states.

I assume that **U** and **M** are differentiable functions of a vector *θ* of parameters, and that **Q**[*θ*] remains an intensity matrix for sufficiently small perturbations of *θ*. This includes as a special case the situation where the elements of *θ* are simply some or all of the *qij* , *i* = *j* . The goal of the perturbation analysis is to obtain the derivatives of properties of the chain with respect to *θ*.

#### **12.2 Occupancy Time in Transient States**

Let *s* be the number of transient states, and *νij* be the time spent in transient state *i* by an individual starting in transient state *j* . Define **N***<sup>k</sup>* = *E νk ij* as the matrix whose entries are the *k*th moments, and **N**dg = *(***N**1*)*dg. The matrix **N**<sup>1</sup> of expectations is the fundamental matrix of the chain. The first several moments of occupancy times are given by the entries of the matrices

$$\mathbf{N}\_{\mathrm{l}} = -\mathbf{U}^{-\mathrm{l}} \tag{12.2}$$

$$\mathbf{N}\_2 = 2\mathbf{N}\_\mathrm{dg}\mathbf{N}\_\mathrm{l} \tag{12.3}$$

$$\mathbf{N}\_3 = 6\mathbf{N}\_{\rm dg}^2 \mathbf{N}\_{\rm l} \tag{12.4}$$

$$\mathbf{N}\_{\mathbf{4}} = 2\mathbf{4}\mathbf{N}\_{\mathbf{d}\mathbf{g}}^{\mathbf{3}}\mathbf{N}\_{\mathbf{l}}\tag{12.5}$$

and, in general, by

$$\mathbf{N}\_{k} = k \mathbf{N}\_{\rm dg} \mathbf{N}\_{k-1} \qquad k \ge 2 \tag{12.6}$$

(Iosifescu 1980, Thm. 8.7).

The differentials of the moments (12.2), (12.3), (12.4), and (12.5) are

$$d\text{vec}\,\mathbf{N}\_{\text{l}} = \left(\mathbf{N}\_{\text{l}}^{\mathsf{T}} \otimes \mathbf{N}\_{\text{l}}\right) d\text{vec}\,\mathbf{U} \tag{12.7}$$

$$d\text{vec}\,\mathbf{N}\_2 = 2\left\{ \left( \mathbf{N}\_1^\mathsf{T} \otimes \mathbf{I} \right) \mathcal{D}\left( \text{vec}\,\mathbf{I} \right) + \left( \mathbf{I} \otimes \mathbf{N}\_{\text{dg}} \right) \right\} \left( \mathbf{N}\_1^\mathsf{T} \otimes \mathbf{N}\_1 \right) d\text{vec}\,\mathbf{U} \qquad (12.8)$$

$$d\operatorname{vec}\mathbf{N}\_{\mathbb{S}} = 6 \left\{ 2 \left( \mathbf{N}\_{\mathbb{I}}^{\mathsf{T}} \otimes \mathbf{N}\_{\mathrm{dg}} \right) \mathcal{D} \left( \operatorname{vec}\mathbf{I} \right) + \left( \mathbf{I} \otimes \mathbf{N}\_{\mathrm{dg}}^{2} \right) \right\} \left( \mathbf{N}\_{\mathbb{I}}^{\mathsf{T}} \otimes \mathbf{N}\_{\mathbb{I}} \right) d\operatorname{vec}\mathbf{U} \tag{12.9}$$

$$d\text{vec}\,\mathbf{N}\_4 = 24\left\{ 3\left(\mathbf{N}\_1^\mathsf{T} \otimes \mathbf{N}\_{\text{dg}}^2\right) \mathcal{D}\left(\text{vec}\,\mathbf{I}\right) + \left(\mathbf{I} \otimes \mathbf{N}\_{\text{dg}}^3\right) \right\} \left(\mathbf{N}\_1^\mathsf{T} \otimes \mathbf{N}\_1\right) d\text{vec}\,\mathbf{U} \tag{12.10}$$

where **I** = **I***<sup>s</sup>* throughout. A recursive relation for all the moments is

$$d\text{vec}\,\mathbf{N}\_k = k\left(\mathbf{N}\_{k-1}^\mathsf{T}\otimes\mathbf{I}\right)\mathcal{D}\left(\mathbf{vec}\,\mathbf{I}\right)d\text{vec}\,\mathbf{N} + k\left(\mathbf{I}\otimes\mathbf{N}\_{\text{dg}}\right)d\text{vec}\,\mathbf{N}\_{k-1} \qquad k\ge 2. \tag{12.11}$$

The variance, standard deviation, and coefficient of variation of the *νij* are important in applications; they are

$$V\left(\nu\_{lj}\right) = \mathbf{N}\_2 - \mathbf{N}\_1 \diamond \mathbf{N}\_l \tag{12.12}$$

$$SD\left(\upsilon\_{lj}\right) = \sqrt{V\left(\upsilon\_{lj}\right)}\tag{12.13}$$

$$CV\left(\nu\_{lj}\right) = \mathcal{D}\left(\text{vec}\,\mathbf{N}\_{l}\right)^{-1}\text{vec}\,SD\left(\nu\_{lj}\right)\tag{12.14}$$

where the square root is taken elementwise. Their derivatives are

$$d\text{vec}\,V = 2\left[\left(\mathbf{N}^{\mathsf{T}}\otimes\mathbf{I}\right)\mathcal{D}\left(\text{vec}\,\mathbf{I}\right) + \left(\mathbf{I}\otimes\mathbf{N}\_{\text{dg}}\right) - \mathcal{D}\left(\text{vec}\,\mathbf{N}\right)\right]d\text{vec}\,\mathbf{N}\_{\text{l}}\tag{12.15}$$

$$d\text{vec}\,SD = \frac{1}{2}\mathcal{D}\left[\text{vec}\,SD\left(\nu\_{lj}\right)\right]^{-1}d\text{vec}\,V\tag{12.16}$$

$$\begin{aligned} d\text{vec}\,CV &= \mathcal{D}\,(\text{vec}\,\mathbf{N}\_{\text{l}})^{-1}\,d\text{vec}\,SD\\ &-\left[ (\text{vec}\,SD)^{\mathsf{T}}\mathcal{D}\,(\text{vec}\,\mathbf{N}\_{\text{l}})^{-1}\otimes\mathcal{D}\,(\text{vec}\,\mathbf{N}\_{\text{l}})^{-1} \right] \\ &\times\mathcal{D}\,\left(\text{vec}\,\mathbf{I}\_{s^{2}}\right)\left(\mathbf{I}\_{s^{2}}\otimes\mathbf{I}\_{s^{2}}\right)d\text{vec}\,\mathbf{N}\_{\text{l}} \end{aligned} \tag{12.17}$$

(suppressing the arguments of *V* , *SD* and *CV* ). Because **N**<sup>1</sup> usually contains zeros, <sup>D</sup> *(*vec **<sup>N</sup>**1*)*−<sup>1</sup> must be restricted to the non-zero entries; the coefficient of variation is undefined if the mean is zero.

**Derivation** The fundamental matrix **<sup>N</sup>**<sup>1</sup> = −**U**−1. Applying (2.82) yields (12.7). The derivatives of the higher moments are obtained by differentiating **N**<sup>2</sup> – **N**<sup>4</sup> in (12.3), (12.4), and (12.5). For example, the differential of **N**<sup>4</sup> is

$$d\mathbf{N}\_{\mathsf{d}} = 24 \left\{ 3\mathbf{N}\_{\mathsf{dg}}^2 \left( d\mathbf{N}\_{\mathsf{dg}} \right) \mathbf{N}\_{\mathsf{l}} + \mathbf{N}\_{\mathsf{dg}}^3 \left( d\mathbf{N}\_{\mathsf{l}} \right) \right\},\tag{12.18}$$

using the fact that **N**dg commutes with itself and *d***N**dg. Applying the vec operator gives

$$d\text{vec}\,\mathbf{N}\_4 = 24\left\{ 3\left(\mathbf{N}\_2^\mathsf{T} \otimes \mathbf{N}\_{\mathrm{dg}}^2\right) d\text{vec}\,\mathbf{N}\_{\mathrm{dg}} + \left(\mathbf{I}\_s \otimes \mathbf{N}\_{\mathrm{dg}}^3\right) d\text{vec}\,\mathbf{N}\_1 \right\}.\tag{12.19}$$

Substituting (11.12) for *d*vec **N**dg and (12.7) for *d*vec **N**<sup>1</sup> gives (12.10). Results (12.8) and (12.9) are obtained in similar fashion.

Differentiating the recurrence relationship (12.6) gives

$$d\mathbf{N}\_k = k \left( d\mathbf{N}\_{\rm dg} \right) \mathbf{N}\_{k-1} + s\mathbf{N}\_{\rm dg} \left( d\mathbf{N}\_{k-1} \right) \,. \tag{12.20}$$

Apply the vec operator,

$$d\text{vec}\,\mathbf{N}\_k = k\left(\mathbf{N}\_{k-1}^\mathsf{T}\otimes\mathbf{I}\_s\right)d\text{vec}\,\mathbf{N}\_{\text{dg}} + k\left(\mathbf{I}\_s\otimes\mathbf{N}\_{\text{dg}}\right)d\text{vec}\,\mathbf{N}\_{k-1},\tag{12.21}$$

and substitute (11.12) for *d*vec **N**dg to obtain (12.11).

The derivative of *V* in (12.15) comes from differentiating (12.12),

$$dV = d\mathbf{N}\_2 - 2\mathbf{N}\_1 \diamond d\mathbf{N}\_1,\tag{12.22}$$

applying the vec operator,

$$d\text{vec}\,V = d\text{vec}\,\mathbf{N}\_2 - 2\mathcal{D}\,\left(\text{vec}\,\mathbf{N}\_1\right)d\text{vec}\,\mathbf{N}\_1,\tag{12.23}$$

and then using (12.7) and (12.8). The derivative of *SD νij* in (12.16) follows from (2.83). The derivative of *CV νij* in (12.17) is obtained using (2.84), with **x** = vec *SD* and **y** = vec **N**1.

#### **12.3 Longevity: Time to Absorption**

Let *ηj* be the time to absorption for an individual currently in transient state *j* . The vectors of the *k*th moments of the time to absorption, *ηk*, satisfy

$$
\boldsymbol{\eta}\_{1}^{\mathsf{T}} = \mathbf{1}^{\mathsf{T}} \mathbf{N}\_{1} \tag{12.24}
$$

$$
\eta\_2^\mathsf{T} = (2)\mathbf{1}^\mathsf{T}\mathbf{N}\_\mathsf{l}^2 \tag{12.25}
$$

$$
\boldsymbol{\eta}\_3^\mathsf{T} = (6)\mathbf{1}^\mathsf{T}\mathbf{N}\_\mathsf{l}^3 \tag{12.26}
$$

$$
\boldsymbol{\eta}\_4^\mathsf{T} = (24)\mathbf{1}^\mathsf{T}\mathbf{N}\_\mathsf{l}^4 \tag{12.27}
$$

and in general

$$
\boldsymbol{\eta}\_{k}^{\mathsf{T}} = k \boldsymbol{\eta}\_{k-1}^{\mathsf{T}} \mathbf{N}\_{1} \qquad k \ge 2 \tag{12.28}
$$

(Iosifescu 1980, Thm. 8.6)

The variance, standard deviation, and coefficient of variation of the time to absorption are

$$V(\eta) = \eta\_2 - \eta\_1 \circ \eta\_1 \tag{12.29}$$

$$SD\left(\eta\right) = \sqrt{V\left(\eta\right)}\tag{12.30}$$

$$CV\left(\eta\right) = \mathcal{D}\left(SD(\eta)\right)^{-1}\eta\_{\perp}\tag{12.31}$$

with the square root taken elementwise.

The derivatives of the moments in (12.24), (12.25), (12.26), and (12.27) are given by

$$d\eta\_1 = \left(\mathbf{N}\_1^{\mathsf{T}} \otimes \boldsymbol{\eta}\_1^{\mathsf{T}}\right) d\mathbf{vec} \,\mathbf{U} \tag{12.32}$$

$$d\eta\_2 = \left\{ 2\left[ \left( \mathbf{N}\_{\parallel}^{\sf T} \right)^2 \otimes \boldsymbol{\eta}\_{\perp}^{\sf T} \right] + 2\left( \mathbf{N}\_{\parallel}^{\sf T} \otimes \boldsymbol{\eta}\_{\perp}^{\sf T} \mathbf{N}\_{\parallel} \right) \right\} d\mathbf{vec} \,\mathbf{U} \tag{12.33}$$

$$d\eta\_3 = \left\{6\left[\left(\mathbf{N}\_1^\mathsf{T}\right)^3 \otimes \boldsymbol{\eta}\_1^\mathsf{T}\right] + 6\left[\left(\mathbf{N}\_1^\mathsf{T}\right)^2 \otimes \boldsymbol{\eta}\_1^\mathsf{T}\mathbf{N}\_1\right]\right\}$$

$$+ 3\left(\mathbf{N}\_1^\mathsf{T} \otimes \boldsymbol{\eta}\_2^\mathsf{T}\mathbf{N}\_1\right)\right\}d\mathrm{vec}\,\mathbf{U}\tag{12.34}$$

$$d\eta\_4 = \left\{ 24 \left[ \left( \mathbf{N}\_1^\mathsf{T} \right)^4 \otimes \boldsymbol{\eta}\_1^\mathsf{T} \right] + 24 \left[ \left( \mathbf{N}\_1^\mathsf{T} \right)^3 \otimes \boldsymbol{\eta}\_1^\mathsf{T} \mathbf{N}\_1 \right] \right.$$

$$+ 12 \left[ \left( \mathbf{N}\_1^\mathsf{T} \right)^2 \otimes \boldsymbol{\eta}\_2^\mathsf{T} \mathbf{N}\_1 \right] + 4 \left( \mathbf{N}\_1^\mathsf{T} \otimes \boldsymbol{\eta}\_3^\mathsf{T} \mathbf{N}\_1 \right) \right\} d\mathrm{vec} \, \mathbf{U} \qquad (12.35)$$

and, recursively,

$$d\eta\_k = k\mathbf{N}\_1^{\mathsf{T}} d\eta\_{k-1} + k\left(\mathbf{I}\_s \otimes \boldsymbol{\eta}\_{k-1}^{\mathsf{T}}\right) d\mathbf{vec} \,\mathbf{N}\_1. \tag{12.36}$$

The derivatives of the variance, standard deviation, and coefficient of variation of the time to absorption are (suppressing the arguments)

$$dV = 2\left\{ \left[ \left( \mathbf{N}\_{\mathrm{l}}^{\mathrm{T}} \right)^{2} \otimes \boldsymbol{\eta}\_{\mathrm{l}}^{\mathrm{T}} \right] + \left( \mathbf{N}\_{\mathrm{l}}^{\mathrm{T}} \otimes \boldsymbol{\eta}\_{\mathrm{l}}^{\mathrm{T}} \mathbf{N}\_{\mathrm{l}} \right) - \mathcal{D}\left( \boldsymbol{\eta}\_{\mathrm{l}} \right) \left( \mathbf{N}\_{\mathrm{l}}^{\mathrm{T}} \otimes \boldsymbol{\eta}\_{\mathrm{l}}^{\mathrm{T}} \right) \right\} d\mathrm{vec} \,\mathbf{U} \tag{12.37}$$

$$\,dSD = \frac{1}{2} \mathcal{D} \, (SD)^{-1} \, dV \,\tag{12.38}$$

$$dCV = \mathcal{D} \left(\eta\_{\text{l}}\right)^{-1} dSD - \left[ SD^{\mathsf{T}} \mathcal{D} \left(\eta\_{\text{l}}\right)^{-1} \otimes \mathcal{D} \left(\eta\_{\text{l}}\right)^{-1} \right]$$

$$\times \mathcal{D} \left(\text{vec} \,\mathbf{I}\_{\text{s}}\right) \left(\mathbf{I}\_{\text{s}} \otimes \mathbf{I}\_{\text{s}}\right) d\eta\_{\text{l}}.\tag{12.39}$$

**Derivation** Differentiating (12.24) for the expected time to absorption gives

$$d\eta\_1^\mathsf{T} = \mathbf{1}\_s^\mathsf{T} d\mathbf{N}\_1,\tag{12.40}$$

Applying the vec operator, substituting (12.7) for *d*vec **N**1, and simplifying gives (12.32). The derivatives of the higher moments are obtained in the same way; e.g., for *η*4,

$$d\eta\_4^\mathsf{T} = (24)\mathbf{1}\_s^\mathsf{T} \left[ (d\mathbf{N}\_\mathsf{I})\,\mathbf{N}\_\mathsf{I}^3 + \mathbf{N}\_\mathsf{I} \,(d\mathbf{N}\_\mathsf{I})\,\mathbf{N}\_\mathsf{I}^2 + \mathbf{N}\_\mathsf{I}^2 (d\mathbf{N}\_\mathsf{I})\,\mathbf{N}\_\mathsf{I} + \mathbf{N}\_\mathsf{I}^3 (d\mathbf{N}\_\mathsf{I}) \right]. \tag{12.41}$$

Applying the vec operator yields

$$d\eta\_4 = 24\left\{ \left[ \left( \mathbf{N}\_{\mathrm{l}}^{\mathsf{T}} \right)^3 \otimes \mathbf{1}\_s^{\mathsf{T}} \right] + \left[ \left( \mathbf{N}\_{\mathrm{l}}^{\mathsf{T}} \right)^2 \otimes \mathbf{1}\_s^{\mathsf{T}} \mathbf{N}\_{\mathrm{l}} \right] + \left[ \mathbf{N}\_{\mathrm{l}}^{\mathsf{T}} \otimes \mathbf{1}\_s^{\mathsf{T}} \mathbf{N}\_{\mathrm{l}}^2 \right] \right. \\ \left. \left. + \left[ \mathbf{I}\_s \otimes \mathbf{1}\_s^{\mathsf{T}} \mathbf{N}\_{\mathrm{l}}^3 \right] \right\} \, d\mathrm{vec} \, \mathbf{N}\_{\mathrm{l}} . \tag{12.42}$$

Substituting (12.7) for *d*vec **N**<sup>1</sup> and simplifying using Eqs. (12.24), (12.25), and (12.26) gives (12.35). The derivatives of the second and third moments, (12.33) and (12.34), are obtained in similar fashion.

The recursive formula (12.36) is obtained by differentiating (12.28)

$$d\boldsymbol{\eta}\_{k}^{\mathsf{T}} = k \left( d\boldsymbol{\eta}\_{k-1}^{\mathsf{T}} \right) \mathbf{N}\_{1} + k \boldsymbol{\eta}\_{k-1}^{\mathsf{T}} d\mathbf{N}\_{1}.\tag{12.43}$$

Apply the vec operator,

$$d\eta\_k = k\mathbf{N}\_1^{\mathsf{T}} d\eta\_{k-1} + k\left(\mathbf{I}\_s \otimes \boldsymbol{\eta}\_{k-1}^{\mathsf{T}}\right) d\mathbf{vec} \,\mathbf{N}\_1,\tag{12.44}$$

substitute (12.7) for *d*vec **N**1, and simplify, to obtain (12.36).

Differentiating (12.29) for the variance yields

$$dV = d\mathfrak{n}\_2 - 2\mathfrak{n}\_1 \diamond d\mathfrak{n}\_1. \tag{12.45}$$

Applying the vec operator gives

$$dV = d\mathfrak{n}\_2 - 2\mathcal{D}\left(\mathfrak{n}\_1\right)d\mathfrak{n}\_1.\tag{12.46}$$

Substituting (12.32) for *dη*<sup>1</sup> and (12.33) for *dη*<sup>2</sup> gives the result (12.37). The derivatives of the standard deviation, in (12.38), and the coefficient of variation, in (12.39), are obtained by differentiating (12.30) and (12.31) and applying (2.83) and (2.84).

#### **12.4 Multiple Absorbing States and Probabilities of Absorption**

Consider a chain that includes *a >* 1 absorbing states. The entry *mij* of the *a* × *s* submatrix **M** in (12.1) is the rate of transition from transient state *j* to absorbing state *i*. The probabilities of absorption are defined as

$$b\_{lj} = P\left[\text{absorption in } i \mid \text{starting in } j\right]. \tag{12.47}$$

The *a* × *s* matrix **B** = *bij* is

$$\mathbf{B} = \mathbf{M} \mathbf{N}\_{\parallel} \tag{12.48}$$

(Iosifescu 1980, Section 8.5.6). Column *j* of **B** is the probability distribution of the eventual absorption state for an individual starting in transient state *j* . Usually a few starting states are of particular interest (e.g., states corresponding to "birth"). Let **B***(*:*,j)* = **Be***<sup>j</sup>* denote column *j* of **B**, where **e***<sup>j</sup>* is the *j* th unit vector of length *s*. Then

$$d\mathbf{B}(:,j) = \left(\mathbf{e}\_j^{\mathsf{T}} \otimes \mathbf{I}\_s\right) d\mathbf{vec} \,\mathbf{B}.\tag{12.49}$$

Similarly, row *<sup>i</sup>* of **<sup>B</sup>** is **<sup>B</sup>***(i,* :*)* <sup>=</sup> **<sup>e</sup>**<sup>T</sup> *<sup>i</sup>* **B** and

$$d\text{vec}\,\mathbf{B}(i,:) = \left(\mathbf{I}\_s \otimes \mathbf{e}\_i^{\sf T}\right) d\text{vec}\,\mathbf{B} \tag{12.50}$$

where **e***<sup>i</sup>* is the *i*th unit vector of length *a*. The derivative of **B** in (12.49) and (12.50) is

$$d\operatorname{vec}\mathbf{B} = \left(\mathbf{N}\_{\mathrm{l}}^{\mathsf{T}} \otimes \mathbf{I}\right) d\operatorname{vec}\mathbf{M} + \left(\mathbf{N}\_{\mathrm{l}}^{\mathsf{T}} \otimes \mathbf{B}\right) d\operatorname{vec}\mathbf{U}.\tag{12.51}$$

**Derivations** Differentiating (12.48) yields

$$d\mathbf{B} = \left(d\mathbf{M}\right)\mathbf{N}\_{\parallel} + \mathbf{M}\left(d\mathbf{N}\_{\parallel}\right). \tag{12.52}$$

Applying the vec operator and simplifying gives

$$d\mathbf{vec}\,\mathbf{B} = \left(\mathbf{N}\_{\mathrm{l}}^{\mathsf{T}} \otimes \mathbf{I}\right) d\mathbf{vec}\,\mathbf{M} + \left(\mathbf{I} \otimes \mathbf{M}\right) d\mathbf{vec}\,\mathbf{N}\_{\mathrm{l}}\tag{12.53}$$

Substituting (12.7) for *d*vec **N**<sup>1</sup> and simplifying gives (12.51).

#### **12.5 The Embedded Chain: Discrete Transitions Within a Continuous Process**

If a continuous-time chain is observed only at the moments when it changes state, the result is a discrete-time process called the embedded Markov chain, or the jump chain, associated with **Q** (Iosifescu 1980, Section 8.3.2). The transition matrix of this embedded chain can be written

$$
\widehat{\mathbf{P}} = \left(\frac{\widehat{\mathbf{U}} \mid \mathbf{0}}{\widehat{\mathbf{M}} \mid \mathbf{I}\_a}\right) \tag{12.54}
$$

where

$$
\widehat{\mathbf{U}} = \mathbf{I}\_s - \mathbf{U} \mathbf{U}\_{\mathrm{dg}}^{-1} \tag{12.55}
$$

$$
\widehat{\mathbf{M}} = -\mathbf{M} \mathbf{U}\_{\mathrm{dg}}^{-1} . \tag{12.56}
$$

The embedded chain provides information on the number of visits to each transient state, rather than the time spent in each transient state. The expected numbers of such visits are given by the fundamental matrix

$$
\widehat{\mathbf{N}}\_{\mathrm{I}} = \left(\mathbf{I} - \widehat{\mathbf{U}}\right)^{-1}.\tag{12.57}
$$

The sensitivity analysis of the embedded chain follows directly from the discretetime results in previous chapters (Chaps. 4 and 5).

In particular, the differential of '**N**<sup>1</sup> is Caswell (2006)

$$d\text{vec}\,\widehat{\mathbf{N}}\_{\text{l}} = \left(\widehat{\mathbf{N}}\_{\text{l}}^{\mathsf{T}} \otimes \widehat{\mathbf{N}}\_{\text{l}}\right) d\text{vec}\,\widehat{\mathbf{U}}.\tag{12.58}$$

However, this derivative is unlikely to be the sensitivity we are looking for. The continuous-time chain is likely to be parameterized in terms of the rate matrices **U** and **M**, rather than the probability matrices '**U** and **M**'. To express the perturbation analysis of '**P** in terms of the parameters of **Q** requires the derivatives of the embedded chain with respect to the continuous chain; i.e.,

$$\frac{d\mathbf{vec}\,\mathbf{\tilde{U}}}{d\mathbf{vec}\,^\mathsf{T}\mathbf{U}}\quad\text{and}\quad\frac{d\mathbf{vec}\,\mathbf{\tilde{M}}}{d\mathbf{vec}\,^\mathsf{T}\mathbf{M}}.$$

These derivatives are

$$d\operatorname{vec}\widehat{\mathbf{U}} = \left[ -\left( \mathbf{U}\_{\mathrm{dg}}^{-1} \otimes \mathbf{I}\_{s} \right) + \left( \mathbf{U}\_{\mathrm{dg}}^{-1} \otimes \mathbf{U} \mathbf{U}\_{\mathrm{dg}}^{-1} \right) \mathcal{D}\left( \operatorname{vec} \mathbf{I}\_{s} \right) \right] d\operatorname{vec}\mathbf{U} \quad (12.59)$$

$$d\operatorname{vec}\widehat{\mathbf{M}} = -\left( \mathbf{U}\_{\mathrm{dg}}^{-1} \otimes \mathbf{I}\_{a} \right) d\operatorname{vec}\mathbf{M} \,\tag{12.60}$$

$$+ \left( \mathbf{I}\_{s} \otimes \mathbf{M} \right) \left( \mathbf{U}\_{\mathrm{dg}}^{-1} \otimes \mathbf{U}\_{\mathrm{dg}}^{-1} \right) \times \mathcal{D}\left( \operatorname{vec} \mathbf{I}\_{s} \right) d\operatorname{vec}\mathbf{U} .$$

Using (12.59) and (12.61), one can write

$$\frac{d\mathbf{vec}\,\widehat{\mathbf{N}}\_{\mathrm{l}}}{d\boldsymbol{\theta}^{\mathsf{T}}} = \left(\widehat{\mathbf{N}}\_{\mathrm{l}}^{\mathsf{T}} \otimes \widehat{\mathbf{N}}\_{\mathrm{l}}\right) \frac{d\mathbf{vec}\,\widehat{\mathbf{U}}}{d\mathbf{vec}\,\widehat{\mathbf{U}}} \,\frac{d\mathbf{vec}\,\mathbf{U}}{d\boldsymbol{\theta}^{\mathsf{T}}}.\tag{12.61}$$

**Derivation** Differentiate '**U** in (12.55),

$$d\widehat{\mathbf{U}} = -\left(d\mathbf{U}\right)\mathbf{U}\_{\mathrm{dg}}^{-1} - \mathbf{U}\left(d\mathbf{U}\_{\mathrm{dg}}^{-1}\right),\tag{12.62}$$

apply the vec operator, and use (2.82) and (11.12) for *d*vec **U**−<sup>1</sup> dg . The result is

$$\begin{split} d\text{vec}\,\widehat{\mathbf{U}} &= -\left[ \left( \mathbf{U}\_{\text{dg}}^{-1} \right)^{\mathsf{T}} \otimes \mathbf{I}\_{s} \right] d\text{vec}\,\mathbf{U} - \left( \mathbf{I}\_{s} \otimes \mathbf{U} \right) d\text{vec}\,\mathbf{U}\_{\text{dg}}^{-1} \\ &= -\left( \mathbf{U}\_{\text{dg}}^{-1} \otimes \mathbf{I}\_{s} \right) d\text{vec}\,\mathbf{U} + \left( \mathbf{I}\_{s} \otimes \mathbf{U} \right) \left( \mathbf{U}\_{\text{dg}}^{-1} \otimes \mathbf{U}\_{\text{dg}}^{-1} \right) \mathcal{D}\left( \text{vec}\,\mathbf{I}\_{s} \right) d\text{vec}\,\mathbf{U} \end{split}$$

which simplifies to give (12.59). Similarly, differentiating **M**' in (12.56) and applying the vec operator gives

$$d\operatorname{vec}\widehat{\mathbf{M}} = -\left(\mathbf{U}\_{\mathrm{dg}}^{-1}\otimes\mathbf{I}\_{a}\right)d\operatorname{vec}\mathbf{M} - \left(\mathbf{I}\_{s}\otimes\mathbf{M}\right)d\operatorname{vec}\mathbf{U}\_{\mathrm{dg}}^{-1}.\tag{12.63}$$

Using (2.82) and (11.12) for *d*vec **U**−<sup>1</sup> dg and simplifying gives (12.61).

#### **12.6 An Example: A Model of Disease Progression**

An important area of application of continuous-time Markov chains is the modelling of transitions among disease states. In this context, the time to absorption is longevity, and the time spent in various transient states has implications for the quality of life during the disease. Fix and Neyman (1951) introduced the idea and proposed a 4-state model for cancer, with two transient states (under treatment or not) and two absorbing states (death from cancer or from other causes). Kay (1986) proposed a model with *k* disease states and an absorbing state representing death. There is now a large literature on such models and their estimation. Recently, studies have proliferated that use Markov chain models of disease transmission to explore the cost-effectiveness of screening and treatment procedures (e.g., Kuo et al. 1999; Chen et al. 1999; Wu et al. 2006; Sonnenberg and Beck 1993).

Sensitivity analysis reveals how these demographic properties respond to changes in parameters. As an example, I consider a model for the progression of colorectal cancer (CRC) that was developed to study the cost-effectiveness of a new CRC screening technique based on DNA testing of stool samples (Wu et al. 2006). The model includes 7 transient states (normal, small and large adenoma, early and late preclinical CRC, and early and late clinical CRC) and 2 absorbing states (death from CRC and death from other causes); see Fig. 12.1. Parameters were estimated from the literature and from clinical studies in Taiwan.

This model, which describes the so-called natural history of the disease, was embedded in a larger decision model to compare the cost-effectiveness of screening strategies. The intensity matrix (12.1) corresponding to Fig. 12.1 is

**Fig. 12.1** State transition diagram for an absorbing Markov chain model of colorectal cancer (CRC) progression. The model includes 7 transient states based on the stage of development of adenoma (polyps) or cancer, and two absorbing states corresponding to death from CRC and death from other causes (OCD). Transition rates are given by *λi*, and mortality rate from other causes by *μ*. (Modified, under the terms of a Creative Commons Attribution License, from Figure 1 of Wu et al. 2006)

$$\mathbf{Q} = \begin{pmatrix} -\lambda\_1 - \mu & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \lambda\_1 & -\lambda\_2 - \mu & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & \lambda\_2 & -\lambda\_3 - \mu & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & \lambda\_3 & -\lambda\_4 - \lambda\_5 - \mu & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & \lambda\_4 & -\lambda\_6 - \mu & 0 & 0 & 0 \\ 0 & 0 & 0 & \lambda\_5 & 0 & -\lambda\_7 - \mu & 0 & 0 \\ 0 & 0 & 0 & 0 & \lambda\_6 & 0 & -\lambda\_8 - \mu & 0 \\ \hline \\ 0 & 0 & 0 & 0 & 0 & \lambda\_7 & \lambda\_8 & 0 \\ \mu & \mu & \mu & \mu & \mu & \mu & \mu & 0 \\ \end{pmatrix}. \tag{12.64}$$

The *λi* are transition rates; *μ* is the mortality rate from other causes of death. The incidence rate of small adenoma (*λ*1) and the mortality rate due to other causes of death (*μ*) are age-dependent. Here I have analyzed values for age 70; based on figures in Wu et al. (2006). This leads to a parameter vector (all rates are per year):

$$\boldsymbol{\theta} = \begin{pmatrix} \lambda\_1 \\ \vdots \\ \lambda\_8 \\ \lambda\_8 \\ \mu \end{pmatrix} = \begin{pmatrix} 1.52 \times 10^{-2} \\ 3.46 \times 10^{-2} \\ 2.15 \times 10^{-2} \\ 3.70 \times 10^{-1} \\ 2.38 \times 10^{-1} \\ 4.85 \times 10^{-1} \\ 3.02 \times 10^{-2} \\ 2.10 \times 10^{-1} \\ 2.20 \times 10^{-2} \end{pmatrix} . \tag{12.65}$$

#### *12.6.1 Sensitivity Results*

The fundamental matrix (12.2) is

$$\mathbf{N}\_{\mathrm{I}} = \begin{pmatrix} 26.9 & 0 & 0 & 0 & 0 & 0 & 0 \\ 7.2 \ 17.7 & 0 & 0 & 0 & 0 & 0 \\ 5.7 \ 14.0 \ 23.0 & 0 & 0 & 0 & 0 \\ 0.2 & 0.5 & 0.8 \ 1.6 & 0 & 0 & 0 \\ 0.1 & 0.4 & 0.6 \ 1.2 \ 2.0 & 0 & 0 \\ 0.9 & 2.2 & 3.6 \ 7.2 & 0 \ 19.2 & 0 \\ 0.3 & 0.7 & 1.2 \ 2.4 \ 4.1 \ 0.00 \ 4.3 \end{pmatrix} . \tag{12.66}$$

Thus, given these rates, a 70-year old normal condition individual would expect to spend 27 years in stage 1, and only 0.9 and 0.3 years in stages 6 and 7 (early and late clinical CRC).<sup>1</sup> Individuals in more advanced stages can expect to spend progressively longer periods in stages 6 and 7 (compare across rows 6 and 7 of **N**1).

The standard deviations (12.13) of the times spent in the transient states are

$$SD\begin{pmatrix}\upsilon\_{ij}\end{pmatrix} = \begin{pmatrix} 26.9 & 0 & 0 & 0 & 0 & 0 & 0\\ 14.2 \ 17.7 & 0 & 0 & 0 & 0 & 0\\ 15.2 \ 21.2 \ 23.0 & 0 & 0 & 0 & 0\\ 0.8 & 1.1 & 1.4 & 1.6 & 0 & 0 & 0\\ 0.7 & 1.1 & 1.4 & 1.8 \ 2.0 & 0 & 0\\ 5.8 & 8.9 & 11.2 \ 15.0 & 0 & 19.2 & 0\\ 1.6 & 2.4 & 3.0 & 3.9 \ 4.3 & 0 & 4.3 \end{pmatrix} . \tag{12.67}$$

Clearly, considerable variation can be expected in the times spent in the various states; the standard deviation equals or exceeds the mean in every case.

Considering the sensitivity analysis of the time spent in transient states, focus on the fate of a normal (state 1) individual. The expected times spent in each state by such an individual are give by **N**1*(*:*,* 1*)*. From (12.7) and (2.55) the sensitivity and elasticity of **N***(*:*,* 1*)* are

$$\frac{d\mathbf{N}\_{1}(:,1)}{d\boldsymbol{\theta}^{\top}} = \begin{pmatrix} -722.6 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -722.6 \\ 280.9 - 127.5 & 0 & 0 & 0 & 0 & 0 & 0 & -321.6 \\ 223.4 & 64.5 - 132.0 & 0 & 0 & 0 & 0 & 0 & -387.8 \\ 7.6 & 2.2 & 4.6 - 0.3 & -0.3 & 0 & 0 & 0 & -13.5 \\ 5.6 & 1.6 & 3.4 & 0.2 - 0.2 & -0.3 & 0 & 0 & -10.2 \\ 34.8 & 10.0 & 21.0 - 1.4 & 2.3 & 0 & -17.1 & 0 & -79.0 \\ 11.6 & 3.4 & 7.0 & 0.3 - 0.5 & 0 & 0 & -1.3 & -22.5 \end{pmatrix}$$

$$\frac{\epsilon\mathbf{N}\_{1}(:,1)}{\epsilon\boldsymbol{\theta}^{\top}} = \begin{pmatrix} -0.4 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -0.6 \\ 0.6 - 0.6 & 0 & 0 & 0 & 0 & 0 & 0 & -1.0 \\ 0.6 & 0.4 - 0.5 & 0 & 0 & 0 & 0 & 0 & -1.5 \\ 0.6 & 0.4 & 0.5 - 0.6 & -0.4 & 0 & 0 & 0 & -1.5 \\ 0.6 & 0.4 & 0.5 & 0.4 - 0.4 & -1.0 & 0 & 0 & -1.5 \\ 0.6 & 0.4 & 0.5 & 0.4 - 0.4 & 0.0 & 0 & -0.6 & 0 & -1.9 \end{pmatrix} .\tag{12.68}$$

These elasticities imply that a 1% increase in *λ*<sup>1</sup> will (to first order) cause about a 0.4% decrease in the mean time spent in the normal state and a 0*.*6% increase in the mean time spent in each other state. A 1% increase in *λ*<sup>4</sup> (the rate of transition between early and late preclinical CRC) creates a 0*.*6% decrease in the time spent

<sup>1</sup>This calculation holds the mortality rate fixed at its values at age 70; in reality it increases with age. Wu et al. (2006) included age variation by providing values of *λ*<sup>1</sup> (the rate of progression from normal to small adenoma) specific to 5-year intervals from 50 to 70 years of age; all other parameters were age-invariant.

in stages 4 and 6 (the early CRC stages) and a 0.4% increase in the time spent in stages 5 and 7 (the late CRC stages). An increase in the mortality rate *μ* due to other causes of death reduces the time spent in any of the transient states.

The elasticity of the variance in the time spent in the transient states by an individual in state 1 is

$$\frac{\epsilon V(\boldsymbol{\nu}\_{l})}{\epsilon \boldsymbol{\theta}^{\top}} = \begin{pmatrix} -0.8 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -1.2 \\ 0.4 - 1.2 & 0 & 0 & 0 & 0 & 0 & 0 & -1.2 \\ 0.5 & 0.3 - 1.0 & 0 & 0 & 0 & 0 & 0 & -1.8 \\ 0.5 & 0.4 & 0.5 - 1.2 & -0.8 & 0 & 0 & 0 & -1.5 \\ 0.6 & 0.4 & 0.5 & 0.4 - 0.4 - 1.9 & 0 & 0 & -1.6 \\ 0.6 & 0.4 & 0.5 - 0.6 & 0.6 & 0 & -1.2 & 0 & -2.3 \\ 0.6 & 0.4 & 0.5 & 0.4 - 0.4 & 0.0 & 0 & -1.8 & -1.7 \end{pmatrix} . \tag{12.69}$$

The sign pattern is the same as that of the elasticities of the mean times in (12.68), so we conclude that any parameter change that increases the mean time spent in a transient state will also increase the variance in that time. The elasticities of the variance are comparable to those of the mean (cf. (12.68) and (12.69)), showing that the means and the variance respond with roughly equal proportional changes.

Longevity is measured by the time to absorption, and is a primary concern in analyses of screening or treatment protocols. The vectors of the mean, standard deviation, and coefficient of variation of longevity are

$$\eta\_1 = \begin{pmatrix} 41.4\\ 35.5\\ 29.1\\ 12.4\\ 6.1\\ 19.2\\ 4.3 \end{pmatrix} \quad SD(\eta) = \begin{pmatrix} 37.4\\ 30.3\\ 25.8\\ 14.1\\ 4.7\\ 19.2\\ 4.3 \end{pmatrix} \quad CV(\eta) = \begin{pmatrix} 0.9\\ 0.9\\ 0.9\\ 1.1\\ 0.8\\ 1.0\\ 1.0 \end{pmatrix}. \tag{12.70}$$

The sensitivity and elasticity of expected longevity (life expectancy) with respect to *θ* are

$$
\frac{d\eta\_1}{d\theta^\top} = \begin{pmatrix}
0 & -112.2 & -234.9 & -3.0 & 3.2 & -0.6 & -41.9 & -3.2 & -1089.1 \\
0 & 0 & -384.2 & -5.0 & 5.3 & -1.0 & -68.6 & -5.2 & -756.5 \\
0 & 0 & 0 & -10.0 & 10.7 & -2.1 & -138.8 & -10.4 & -176.0 \\
0 & 0 & 0 & 0 & 0 & -3.5 & 0 & -17.8 & -29.8 \\
0 & 0 & 0 & 0 & 0 & 0 & -367.0 & 0 & -367.0 \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & -18.6 & -18.6
\end{pmatrix}
$$

$$\frac{\epsilon\eta\_{1}}{\epsilon\theta^{\top}} = \begin{pmatrix} -0.06 & -0.04 & -0.05 & -0.01 \ 0.01 & -0.00 & -0.01 & -0.83 \\ 0 & -0.11 & -0.14 & -0.03 \ 0.02 & -0.01 & -0.04 & -0.02 & -0.68 \\ 0 & 0 & -0.28 & -0.06 \ 0.04 & -0.02 & -0.07 & -0.04 & -0.57 \\ 0 & 0 & 0 & -0.30 \ 0.21 & -0.08 & -0.34 & -0.18 & -0.31 \\ 0 & 0 & 0 & 0 & 0 & -0.28 & 0 & -0.61 & -0.11 \\ 0 & 0 & 0 & 0 & 0 & 0 & -0.58 & 0 & -0.42 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & -0.91 & -0.09 \end{pmatrix} . \tag{12.71}$$

Almost all the nonzero elements are negative, because increasing any of the rates leading towards clinical CRC reduces life expectancy, as does increasing the mortality rate due to other causes of death. The exceptions are the sensitivities and elasticities of *η*<sup>1</sup> to *λ*<sup>5</sup> (in column 5 of these matrices), which are positive because *λ*<sup>5</sup> delays the onset of clinical CRC (cf. Fig. 12.1).

The elasticities of *E(η*1*)*, the life expectancy of a normal individual, to a change in *θ*, appear in the first row of (12.71). The largest of these (except for the last column, representing mortality from other causes of death) are to changes in *λ*1, *λ*2, and *λ*3, the rates of transition from normal to small adenoma, small to large adenoma, and large adenoma to preclinical CRC. The rates *λ*<sup>2</sup> and *λ*<sup>3</sup> have large effects on *E(η*2*)*, and *λ*<sup>3</sup> has a large effect on *E(η*3*)*. These transitions are targets of screening and early treatment; this analysis quantifies the effect that such interventions could have.

The sensitivity and elasticity of the standard deviation of longevity are

$$\frac{dSD}{d\boldsymbol{\theta}^{\top}} = \begin{pmatrix} -0.27 & -0.07 & -0.16 & -0.00 & 0.00 & -0.03 & -0.00 & -1.19\\ 0 & -0.13 & -0.31 & -0.00 & 0.00 & -0.06 & -0.00 & -0.76\\ 0 & 0 & -0.43 & -0.00 & 0.00 & -0.00 & -0.00 & -0.61\\ 0 & 0 & 0 & -0.01 & 0.01 & 0.00 & -0.27 & 0.00 & -0.27\\ 0 & 0 & 0 & 0 & 0 & -0 & 0.00 & -0.02 & -0.02\\ 0 & 0 & 0 & 0 & 0 & 0 & -0.37 & 0 & -0.37\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & -0.02 & -0.02\\ \end{pmatrix} \times 10^{3}$$

and

$$\frac{\epsilon \mathcal{S} D(\eta)}{\epsilon \theta^{\mathsf{T}}} = \begin{pmatrix} -0.11 & -0.06 & -0.09 & -0.02 \ 0.01 & -0.00 & -0.02 & -0.01 & -0.70 \\ 0 & -0.15 & -0.22 & -0.04 \ 0.03 & -0.06 & -0.01 & -0.55 \\ 0 & 0 & -0.36 & -0.05 \ 0.05 & -0.00 & -0.11 & -0.01 & -0.52 \\ 0 & 0 & 0 & -0.23 \ 0.23 & 0.01 & -0.58 & 0.00 & -0.43 \\ 0 & 0 & 0 & 0 & 0 & -0.16 & 0.00 & -0.75 & -0.09 \\ 0 & 0 & 0 & 0 & 0 & 0 & -0.58 & 0 & -0.42 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & -0.91 & -0.09 \end{pmatrix} . \tag{12.73}$$

These have the same sign pattern as the sensitivity of *η*1, indicating that any increase in life expectancy will be accompanied by an increase in the variance of longevity. The coefficient of variation takes this joint change into account; from (12.39),

$$\frac{\epsilon \, CV \, (\eta)}{\epsilon \, \theta^{\mathsf{T}}} = \begin{pmatrix} 0.04 & 0.02 & 0.03 & 0.00 & -0.00 & -0.00 & 0.01 & -0.00 & -0.31 \\ 0 & -0.00 & 0.02 & -0.01 & 0.00 & -0.01 & 0.01 & -0.01 & -0.38 \\ 0 & 0 & -0.01 & -0.03 & 0.01 & -0.02 & 0.01 & -0.04 & -0.21 \\ 0.00 & 0.00 & 0.00 & -0.00 & -0.07 & -0.08 & 0.32 & -0.14 & 0.19 \\ 0 & 0 & 0 & 0.00 & 0.00 & -0.30 & 0.00 & -0.27 & -0.09 \\ 0 & 0 & 0 & 0 & 0 & 0.00 & 0.00 & 0.00 & 0.00 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0.00 & 0.00 \end{pmatrix} . \tag{12.74}$$

Most of these elasticities are small, suggesting that the mean and standard deviation respond roughly proportionally, so that the *CV* does not change much.

The matrix **B** in (12.48), giving the ultimate probability of death from CRC (row 1) or other causes of death (row 2) is

$$\mathbf{B} = \begin{pmatrix} 0.1 \ 0.2 \ 0.4 \ 0.7 \ 0.9 \ 0.6 \ 0.9\\ 0.9 \ 0.8 \ 0.6 \ 0.3 \ 0.1 \ 0.4 \ 0.1 \end{pmatrix}.\tag{12.75}$$

*.*

Focusing on the probability of death due to CRC, the sensitivity and elasticity, from (12.50), are

$$
\frac{d\mathbf{vec}\,\mathbf{B}(1,\,\mathrm{:})}{d\boldsymbol{\theta}^{\mathrm{T}}} = \begin{pmatrix}
3.5\ 1.0\ 2.1\ 0.0 \ -0.0\ 0.0 \ 0.4\ 0.0 \ -7.1\ 0.1\ 0.0 \\
0\ 2.5\ 5.2\ 0.1\ -0.1\ 0.0 \ 0.9\ 0.1\ -11.5\ 0.1\ -11.5\ 0.1\ -11.5\ 0.1\ -12.5\ 0.1\ -13.5\ 0.1\ -13.5\ -13.5\ -14.5\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -15.1\ -$$

$$\frac{\epsilon \mathbf{vec} \mathbf{B} (1, :)}{\epsilon \boldsymbol{\theta}^{\mathsf{T}}} = \begin{pmatrix} 0.6 \, 0.4 \, 0.5 \, 0.1 \, -0.1 \, 0.0 \, 0.1 \, 0.1 \, -1.7\\ 0 \, 0.4 \, 0.5 \, 0.1 \, -0.1 \, 0.0 \, 0.1 \, 0.1 \, -1.2\\ 0 \quad 0 \, 0.5 \, 0.1 \, -0.1 \, 0.0 \, 0.1 \, 0.1 \, -0.8\\ 0 \quad 0 \quad 0 \, 0.1 \, -0.1 \, 0.0 \, 0.1 \, 0.0 \, -0.3\\ 0 \quad 0 \quad 0 \quad 0 \quad 0 \quad 0 \, 0.0 \, 0 \, 0.1 \, -0.1\\ 0 \quad 0 \quad 0 \quad 0 \quad 0 \quad 0 \quad 0 \, 0.4 \quad 0 \quad -0.4\\ 0 \quad 0 \quad 0 \quad 0 \quad 0 \quad 0 \quad 0 \quad 0 \, 0.1 \, -0.1 \end{pmatrix}$$

The probability of death from CRC could be reduced by increasing the mortality rate due to other causes (last column), although this is not an attractive treatment option. A more useful interpretation of the last column is as an indication of the increase in death from CRC that would result from reducing other causes of death.

For normal individuals, the risk of death from CRC is most elastic to changes in *λ*2, *λ*3, and *λ*<sup>4</sup> (row 1). The row sums of the elasticity matrix, corresponding to the effects of a proportional change in all rates, sum to zero because a change of time scale does not affect the probability of absorption.

#### *12.6.2 Sensitivity of the Embedded Chain*

The transition matrix '**P** in (12.54) for the embedded chain is

$$
\widehat{\mathbf{P}} = \begin{pmatrix}
0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
0.41 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0.61 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0.49 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0.59 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0.38 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0.96 & 0 & 0 & 0 \\
\hline
0 & 0 & 0 & 0 & 0 & 0.58 & 0.91 & 0 \\
0.59 & 0.39 & 0.51 & 0.03 & 0.04 & 0.42 & 0.09 & 0 \\
\end{pmatrix}. \tag{12.76}
$$

The fundamental matrix '**N**<sup>1</sup> from (12.57) is

$$
\widehat{\mathbf{N}}\_{1} = \begin{pmatrix}
1.0 & 0 & 0 & 0 & 0 & 0 & 0 \\
0.4 & 1.0 & 0 & 0 & 0 & 0 & 0 \\
0.2 \ 0.6 \ 1.0 & 0 & 0 & 0 & 0 \\
0.1 \ 0.3 \ 0.5 \ 1.0 & 0 & 0 & 0 \\
0.1 \ 0.2 \ 0.3 \ 0.6 \ 1.0 & 0 & 0 \\
0.1 \ 0.1 \ 0.2 \ 0.4 & 0 \ 1.0 & 0 \\
0.1 \ 0.2 \ 0.3 \ 0.6 \ 1.0 & 0 & 1.0
\end{pmatrix}.\tag{12.77}
$$

In this continuous-time chain, states cannot be re-entered (cf. Fig. 12.1). Because a state can be visited at most once, the mean number of visits is also the probability of ever entering the state. Thus the probabilities that a normal individual will ever suffer early or late clinical CRC are '**N**1*(*6*,* <sup>1</sup>*)* <sup>=</sup> <sup>0</sup>*.*1, and '**N**1*(*7*,* <sup>1</sup>*)* <sup>=</sup> <sup>0</sup>*.*07, respectively. These probabilities increase for individuals in successively later stages; for an individual with large adenoma the probabilities are '**N**1*(*6*.*3*)* <sup>=</sup> <sup>0</sup>*.*2 and '**N**1*(*7*,* <sup>3</sup>*)* <sup>=</sup> <sup>0</sup>*.*3, respectively.

Focusing sensitivity analysis on individuals in the normal state (state 1), the sensitivities and elasticities of the number of visits are

$$\frac{d\widehat{\mathbf{N}}\_{1}(:,1)}{d\boldsymbol{\theta}^{\mathsf{T}}} = \begin{pmatrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 15.9 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -11.0\\ 9.7 2.8 & 0 & 0 & 0 & 0 & 0 & 0 & -11.1\\ 4.8 & 1.4 & 2.9 & 0 & 0 & 0 & 0 & -8.3\\ 2.8 & 0.8 & 1.7 & 0.1 & -0.1 & 0 & 0 & 0 & -5.0\\ 1.8 & 0.5 & 1.1 & -0.1 & 0.1 & 0 & 0 & -3.2\\ 2.7 & 0.8 & 1.6 & 0.1 & -0.1 & 0.0 & 0 & -4.9 \end{pmatrix} \tag{12.78}$$

and

$$
\frac{\epsilon\widehat{\mathbf{N}}\_{1}(:,1)}{\epsilon\boldsymbol{\theta}^{\mathsf{T}}} = \begin{pmatrix}
0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
0.6 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -0.6 \\
0.6 & 0.4 & 0 & 0 & 0 & 0 & 0 & 0 & -1.0 \\
0.6 & 0.4 & 0.5 & 0 & 0 & 0 & 0 & 0 & -1.5 \\
0.6 & 0.4 & 0.5 & 0.4 & -0.4 & 0 & 0 & 0 & -1.5 \\
0.6 & 0.4 & 0.5 & -0.6 & 0.6 & 0 & 0 & 0 & -1.5 \\
0.6 & 0.4 & 0.5 & 0.41 & -0.4 & 0.04 & 0 & -1.5
\end{pmatrix} . \tag{12.79}
$$

The sensitivities and elasticities of the probability of contracting clinical CRC are given by the last two rows. These probabilities are highly elastic to *λ*1, *λ*<sup>2</sup> and *λ*3. The elasticities to *μ* indicate that every 1% reduction in mortality due to other causes will cause about a 1*.*5% increase in the probability of experiencing clinical CRC.

#### **12.7 Discussion**

The results of this chapter have been presented in terms of differentials of, or derivatives with respect to, a general vector *θ* of parameters. The nature of these parameters and their relation to **Q**, **U**, or **M** can be very general. At its simplest, *θ* could consist of some subset of the elements of **Q**. This is the case in the CRC example (Sect. 12.6), in which the parameters are transition rates *λi* and mortality rates *μi*. More generally, the transition rates might themselves be written as functions of other variables. For example, in Van Den Hout and Matthews (2009a,b) the rates are written as *qij* <sup>=</sup> exp *β*T *ij* **z** , *i* = *j* , where **z** is a vector of covariates (e.g., age, medical care) and *βij* is a vector of coefficients to be estimated. The results presented here can be applied directly to such cases, and indeed to even more complicated functional dependencies, using the chain rule. Thus, focusing on parametric dependence is not only scientifically valuable (these are, after all, the relationships of interest in applications of Markov chains) but also extremely general.

Epidemic models are often written as continuous-time Markov chains, specified in terms of rates of movement among infection states. Gómez-Corral and López-García (2018) extended the methods of this chapter to a model in which individuals are classified by two state variables (a level-dependent quasi-birth-death process). The model may be considered a continuous-time analog of the age×stage models of Chap. 6 (Caswell 2012; Caswell and Salguero-Gómez 2013; Caswell et al. 2018). Their approach takes advantage of the block structure of the intensity matrix for such processes. They have also applied the approach to receptor-ligand complexes within cells (López-García et al. 2018). As far removed from demography as molecules may seem, the concepts of i-state transitions, of inferring population behavior from individual trajectories, and of sensitivity analysis still apply. That's a good thing.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.